{"id":1862,"date":"2024-03-03T17:19:15","date_gmt":"2024-03-03T23:19:15","guid":{"rendered":"https:\/\/sites.imsa.edu\/hadron\/?p=1862"},"modified":"2024-03-03T17:20:53","modified_gmt":"2024-03-03T23:20:53","slug":"what-the-r-an-introduction-to-statistical-programming","status":"publish","type":"post","link":"https:\/\/sites.imsa.edu\/hadron\/2024\/03\/03\/what-the-r-an-introduction-to-statistical-programming\/","title":{"rendered":"What The R? An Introduction to Statistical Programming"},"content":{"rendered":"<p style=\"text-align: center\">Written By: Jeev Hora<\/p>\n<p><b>R You Ready?<\/b><\/p>\n<p><span style=\"font-weight: 400\">Programming Languages, much like tools in a toolbox, can have different uses for different scenarios. Just like how a carpenter would use a hammer for a nail but not a screw, a web designer would usually use Javascript for front-end, but not always back-end, development. Similarly,K Statisticians and data analysts use R for quickly generating, analyzing, and visualizing data.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><b>The *R*ise of R<\/b><\/p>\n<p><span style=\"font-weight: 400\">R was created by 2 professors, Ross Ihaka and Robert Gentleman, from the University of Auckland as an updated version of the 1970s language S. S was made for a similar purpose to R\u2013to help the statisticians at Bell laboratories collect and investigate data\u2013but it was becoming woefully slow and outdated. R\u2019s original intended purpose was for an introductory statistics course at the university, but it eventually became popular enough on its own. R was maintained by the R project, and is now a free and open-source programming language maintained by programmers around the world.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p style=\"text-align: center\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-1864 aligncenter\" src=\"http:\/\/sites.imsa.edu\/hadron\/files\/2024\/03\/Robert-Gentleman-230x300.png\" alt=\"\" width=\"117\" height=\"153\" srcset=\"https:\/\/sites.imsa.edu\/hadron\/files\/2024\/03\/Robert-Gentleman-230x300.png 230w, https:\/\/sites.imsa.edu\/hadron\/files\/2024\/03\/Robert-Gentleman-600x784.png 600w, https:\/\/sites.imsa.edu\/hadron\/files\/2024\/03\/Robert-Gentleman.png 713w\" sizes=\"auto, (max-width: 117px) 100vw, 117px\" \/><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-1863\" src=\"http:\/\/sites.imsa.edu\/hadron\/files\/2024\/03\/1920px-Ross_Ihaka_5189180796-300x200.jpg\" alt=\"\" width=\"177\" height=\"118\" srcset=\"https:\/\/sites.imsa.edu\/hadron\/files\/2024\/03\/1920px-Ross_Ihaka_5189180796-300x200.jpg 300w, https:\/\/sites.imsa.edu\/hadron\/files\/2024\/03\/1920px-Ross_Ihaka_5189180796-1024x683.jpg 1024w, https:\/\/sites.imsa.edu\/hadron\/files\/2024\/03\/1920px-Ross_Ihaka_5189180796-768x512.jpg 768w, https:\/\/sites.imsa.edu\/hadron\/files\/2024\/03\/1920px-Ross_Ihaka_5189180796-1536x1024.jpg 1536w, https:\/\/sites.imsa.edu\/hadron\/files\/2024\/03\/1920px-Ross_Ihaka_5189180796-600x400.jpg 600w, https:\/\/sites.imsa.edu\/hadron\/files\/2024\/03\/1920px-Ross_Ihaka_5189180796-210x140.jpg 210w, https:\/\/sites.imsa.edu\/hadron\/files\/2024\/03\/1920px-Ross_Ihaka_5189180796.jpg 1920w\" sizes=\"auto, (max-width: 177px) 100vw, 177px\" \/><b>Figure 1<\/b><\/p>\n<p style=\"text-align: center\"><span style=\"font-weight: 400\">Robert Gentleman and Ross Ihaka.<\/span><\/p>\n<p style=\"text-align: center\"><span style=\"font-weight: 400\">Sources: ScorpionX and Wikimedia Foundation.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400\">R is a dynamic, object-oriented, functional programming language. Dynamic programming means that many of the properties of the script, including the variable types, are only executed when the code is running; therefore, dynamic programming makes code simpler and more efficient to use. Object-oriented programming languages run code through <\/span><i><span style=\"font-weight: 400\">objects<\/span><\/i><span style=\"font-weight: 400\"> which can store data\/fields and procedures. Functional programming is very similar, but it uses the creation and composition of certain functions to create, store, and run data. All these attributes make R an extremely versatile language that can be picked up by anyone, with a little bit of patience.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><b>Getting StaRted<\/b><\/p>\n<p><span style=\"font-weight: 400\">R and Rstudio (the Integrated Development Environment that R runs on)\u00a0 can be installed on <\/span><a href=\"https:\/\/cloud.r-project.org\/\"><span style=\"font-weight: 400\">https:\/\/cloud.r-project.org\/<\/span><\/a><span style=\"font-weight: 400\"> and <\/span><a href=\"https:\/\/posit.co\/download\/rstudio-desktop\/\"><span style=\"font-weight: 400\">https:\/\/posit.co\/download\/rstudio-desktop\/<\/span><\/a><span style=\"font-weight: 400\">. Rstudio is the most popular way to program R, but regular R and other IDEs can be used as well, with more work.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-1867 aligncenter\" src=\"http:\/\/sites.imsa.edu\/hadron\/files\/2024\/03\/RstudioHome-1-300x178.png\" alt=\"\" width=\"300\" height=\"178\" srcset=\"https:\/\/sites.imsa.edu\/hadron\/files\/2024\/03\/RstudioHome-1-300x178.png 300w, https:\/\/sites.imsa.edu\/hadron\/files\/2024\/03\/RstudioHome-1-1024x606.png 1024w, https:\/\/sites.imsa.edu\/hadron\/files\/2024\/03\/RstudioHome-1-768x455.png 768w, https:\/\/sites.imsa.edu\/hadron\/files\/2024\/03\/RstudioHome-1-1536x909.png 1536w, https:\/\/sites.imsa.edu\/hadron\/files\/2024\/03\/RstudioHome-1-2048x1212.png 2048w, https:\/\/sites.imsa.edu\/hadron\/files\/2024\/03\/RstudioHome-1-600x355.png 600w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/p>\n<p style=\"text-align: center\"><b>Figure 2<\/b><\/p>\n<p style=\"text-align: center\"><span style=\"font-weight: 400\">Layout of Rstudio<\/span><\/p>\n<p style=\"text-align: center\"><span style=\"font-weight: 400\">Source: Rstudio<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400\">The leftmost panel of the Rstudio interface features the actual code, the top right displays saved R and Python code, and the bottom right shows previously used files. As seen in Figure 2, Rstudio provides an ideal environment for\u00a0 all the coding and stats to happen!<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><b>Coding in R-anese<\/b><\/p>\n<p><span style=\"font-weight: 400\">Simple arithmetic can be done using variables. Variables can be assigned to strings or lists:<\/span><\/p>\n<p><i><span style=\"font-weight: 400\">\u00a0&gt; a &lt;- 5<\/span><\/i><\/p>\n<p><i><span style=\"font-weight: 400\">&gt; b &lt;-6<\/span><\/i><\/p>\n<p><i><span style=\"font-weight: 400\">&gt; sum(a,b)<\/span><\/i><\/p>\n<p><i><span style=\"font-weight: 400\">[1] 11<\/span><\/i><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400\">It\u2019s easy to make charts (also known as data frames) in R utilizing the <\/span><i><span style=\"font-weight: 400\">data.frame<\/span><\/i><span style=\"font-weight: 400\"> command with multiple variables:<\/span><\/p>\n<p><i><span style=\"font-weight: 400\">&gt;name &lt;- c(&#8220;John&#8221;,&#8221;Jessica&#8221;,&#8221;Mike&#8221;)<\/span><\/i><\/p>\n<p><i><span style=\"font-weight: 400\">&gt; age &lt;-c(15,18,14)<\/span><\/i><\/p>\n<p><i><span style=\"font-weight: 400\">&gt; gender&lt;-c(&#8220;M&#8221;,&#8221;F&#8221;,&#8221;M&#8221;)<\/span><\/i><\/p>\n<p><i><span style=\"font-weight: 400\">&gt; friends &lt;- data.frame(name, age, gender)<\/span><\/i><\/p>\n<p><i><span style=\"font-weight: 400\">&gt; friends$name<\/span><\/i><\/p>\n<p><i><span style=\"font-weight: 400\">[1] \u201cJohn\u201d \u201cJessica\u201d \u201cMike\u201d<\/span><\/i><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400\">T-tests and Chi-square tests are also commonly performed and simple to implement\u00a0 in R. This code shows an example:<\/span><\/p>\n<p><i><span style=\"font-weight: 400\">t.test(a, b,<\/span><\/i><\/p>\n<p><i><span style=\"font-weight: 400\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0alternative = c(&#8220;two.sided&#8221;, &#8220;less&#8221;, &#8220;greater&#8221;),<\/span><\/i><\/p>\n<p><i><span style=\"font-weight: 400\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0mu = 0, paired = FALSE, var.equal = FALSE,<\/span><\/i><\/p>\n<p><i><span style=\"font-weight: 400\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0conf.level = 0.95, &#8230;)<\/span><\/i><\/p>\n<p><span style=\"font-weight: 400\">While this code may initially seem overwhelming, it\u2019s not much different from the statistics one may learn in a class such asMSI, just now in code form!<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><b>Installing Packages<\/b><\/p>\n<p><span style=\"font-weight: 400\">R features many modules, libraries, and packages to add new functionality and quickly perform commands. Some of these important packages include Tidyverse, and more specifically, tibble and ggplot2.Tidyverse is a collection of packages for R, making it easier to create functions and plots. These utilities are found at\u00a0 tidyverse.org and can be applied by using the <\/span><i><span style=\"font-weight: 400\">install.packages(&#8220;[package]&#8221;) <\/span><\/i><span style=\"font-weight: 400\">command to install the package, and then loading the package by using <\/span><i><span style=\"font-weight: 400\">install.packages(&#8220;[package]\u201d)<\/span><\/i><span style=\"font-weight: 400\">. <\/span><i><span style=\"font-weight: 400\">\u00a0<\/span><\/i><span style=\"font-weight: 400\">Tibble allows for more freedom with data frames, allowing the customization of names and other useful features. Ggplot2 is a data visualization tool. For example, if we had a large data frame named <\/span><i><span style=\"font-weight: 400\">mpg<\/span><\/i><span style=\"font-weight: 400\">, we could use ggplot2 to plot the data using this code:<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><i><span style=\"font-weight: 400\">&gt;ggplot(data = mpg) +\u00a0<\/span><\/i><\/p>\n<p><i><span style=\"font-weight: 400\">\u00a0&gt;geom_point(mapping = aes(x = displ, y = hwy))<\/span><\/i><\/p>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-1868 aligncenter\" src=\"http:\/\/sites.imsa.edu\/hadron\/files\/2024\/03\/Graphing8-300x194.png\" alt=\"\" width=\"300\" height=\"194\" srcset=\"https:\/\/sites.imsa.edu\/hadron\/files\/2024\/03\/Graphing8-300x194.png 300w, https:\/\/sites.imsa.edu\/hadron\/files\/2024\/03\/Graphing8-768x496.png 768w, https:\/\/sites.imsa.edu\/hadron\/files\/2024\/03\/Graphing8-600x387.png 600w, https:\/\/sites.imsa.edu\/hadron\/files\/2024\/03\/Graphing8.png 826w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/p>\n<p style=\"text-align: center\"><b>Figure 3<\/b><\/p>\n<p style=\"text-align: center\"><span style=\"font-weight: 400\">Effect of ggplot2 data<\/span><\/p>\n<p style=\"text-align: center\"><span style=\"font-weight: 400\">Source: <\/span><i><span style=\"font-weight: 400\">R For Data Science, by Wickham &amp; Grolemund<\/span><\/i><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400\">Together, Tidyverse, ggplot2, and tibble are important in optimizing the R user experience.<\/span><\/p>\n<p><b>R is Supe*R* Useful!<\/b><\/p>\n<p><span style=\"font-weight: 400\">R is used in multiple industries and by thousands of companies all over the world. According to a paper by Dhanda et al., R has an invaluable role in the scientific field, especially considering data science and biocomputing. Furthermore, R also has new packages being updated and released every day, making it advantageous to use. For example, one team of researchers built a predictive bio-growth package for R that has now been incorporated in other papers.<\/span><\/p>\n<p><span style=\"font-weight: 400\">R is also having massive growth in analyzing and improving neural networks and machine learning algorithms. Certain packages like caret, randomForest, and e1071 have facilitated such growth.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-1869 aligncenter\" src=\"http:\/\/sites.imsa.edu\/hadron\/files\/2024\/03\/Screenshot-2024-03-03-171454-300x186.png\" alt=\"\" width=\"300\" height=\"186\" srcset=\"https:\/\/sites.imsa.edu\/hadron\/files\/2024\/03\/Screenshot-2024-03-03-171454-300x186.png 300w, https:\/\/sites.imsa.edu\/hadron\/files\/2024\/03\/Screenshot-2024-03-03-171454-1024x635.png 1024w, https:\/\/sites.imsa.edu\/hadron\/files\/2024\/03\/Screenshot-2024-03-03-171454-768x476.png 768w, https:\/\/sites.imsa.edu\/hadron\/files\/2024\/03\/Screenshot-2024-03-03-171454-1536x953.png 1536w, https:\/\/sites.imsa.edu\/hadron\/files\/2024\/03\/Screenshot-2024-03-03-171454-600x372.png 600w, https:\/\/sites.imsa.edu\/hadron\/files\/2024\/03\/Screenshot-2024-03-03-171454.png 1543w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/p>\n<p style=\"text-align: center\"><b>Figure 4<\/b><\/p>\n<p style=\"text-align: center\"><span style=\"font-weight: 400\">Linear regressions of age and height, plotted using R. This is one major use of R.<\/span><\/p>\n<p style=\"text-align: center\"><span style=\"font-weight: 400\">Source: Rstudio<\/span><\/p>\n<p><b>Conclusion<\/b><\/p>\n<p><span style=\"font-weight: 400\">While R already has important uses in traditional statistics, it is currently gaining popularity in many other fields such as data analytics, epidemiology, computational biology, and machine learning. Anyone can utilize R, from someone with less than CSI experience to a Ph.D. professor in Mathematics and Statistics. Many of the teachers at IMSA have even used R for their Master&#8217;s Thesis! So if you\u2019re ever saying to yourself, \u201cWhat the R?\u201d, just keep calm and R on!<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400\">References<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400\">Giorgi, F. M., Ceraolo, C., &amp; Mercatelli, D. (2022). The R Language: An Engine for Bioinformatics and Data Science. Life, 12(5), 648. <\/span><a href=\"https:\/\/doi.org\/10.3390\/life12050648\"><span style=\"font-weight: 400\">https:\/\/doi.org\/10.3390\/life12050648<\/span><\/a><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400\">Morandat, F., Hill, B., Osvald, L., &amp; Vitek, J. (2012). Evaluating the Design of the R Language. ECOOP 2012 \u2013 Object-Oriented Programming, 7313, 104\u2013131. <\/span><a href=\"https:\/\/doi.org\/10.1007\/978-3-642-31057-7_6\"><span style=\"font-weight: 400\">https:\/\/doi.org\/10.1007\/978-3-642-31057-7_6<\/span><\/a><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400\">Wikipedia. <\/span><a href=\"https:\/\/en.wikipedia.org\/wiki\/Ross_Ihaka#\/media\/File:Ross_Ihaka_(5189180796).jpg\"><span style=\"font-weight: 400\">https:\/\/en.wikipedia.org\/wiki\/Ross_Ihaka#\/media\/File:Ross_Ihaka_(5189180796).jpg<\/span><\/a><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400\">Shepperd, M. (2023). CS5702 Modern Data Book. In bookdown.org. Bookdown. <\/span><a href=\"https:\/\/bookdown.org\/martin_shepperd\/ModernDataBook\/\"><span style=\"font-weight: 400\">https:\/\/bookdown.org\/martin_shepperd\/ModernDataBook\/<\/span><\/a><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400\">The R Foundation. (2019). What is R? R-Project.org. <\/span><a href=\"https:\/\/www.r-project.org\/about.html\"><span style=\"font-weight: 400\">https:\/\/www.r-project.org\/about.html<\/span><\/a><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400\">Wickham, H., &amp; Grolemund, G. (2017). R for data science : import, tidy, transform, visualize, and model data. O\u2019reilly. <\/span><a href=\"https:\/\/r4ds.had.co.nz\/index.html\"><span style=\"font-weight: 400\">https:\/\/r4ds.had.co.nz\/index.html<\/span><\/a><span style=\"font-weight: 400\">.<\/span><\/p>\n<p><span style=\"font-weight: 400\">\u00a0<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Written By: Jeev Hora R You Ready? Programming Languages, much like tools in a toolbox, can have different uses for different scenarios. Just like how a carpenter would use a hammer for a nail but not a screw, a web designer would usually use Javascript<\/p>\n","protected":false},"author":929,"featured_media":1869,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"ngg_post_thumbnail":0,"footnotes":""},"categories":[13],"tags":[],"class_list":["post-1862","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technology"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/sites.imsa.edu\/hadron\/wp-json\/wp\/v2\/posts\/1862","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sites.imsa.edu\/hadron\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sites.imsa.edu\/hadron\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sites.imsa.edu\/hadron\/wp-json\/wp\/v2\/users\/929"}],"replies":[{"embeddable":true,"href":"https:\/\/sites.imsa.edu\/hadron\/wp-json\/wp\/v2\/comments?post=1862"}],"version-history":[{"count":2,"href":"https:\/\/sites.imsa.edu\/hadron\/wp-json\/wp\/v2\/posts\/1862\/revisions"}],"predecessor-version":[{"id":1872,"href":"https:\/\/sites.imsa.edu\/hadron\/wp-json\/wp\/v2\/posts\/1862\/revisions\/1872"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/sites.imsa.edu\/hadron\/wp-json\/wp\/v2\/media\/1869"}],"wp:attachment":[{"href":"https:\/\/sites.imsa.edu\/hadron\/wp-json\/wp\/v2\/media?parent=1862"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sites.imsa.edu\/hadron\/wp-json\/wp\/v2\/categories?post=1862"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sites.imsa.edu\/hadron\/wp-json\/wp\/v2\/tags?post=1862"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}