Dennis Murphy
2 Septemer 2014
Tonight's agenda:
R is a statistical programming language. Its original target audience was statisticians and applied scientists. Big Data has changed that.
R was written in the early 1990's before the advent of Big Data. Some design decisions made at the time limit its effectiveness (at least w/o help from contributed packages):
Two graphics engines:
graphics
)grid
)lattice
and latticeExtra
(grid graphics)ggplot2
(grid graphics)plotrix
(base graphics)rgl
, scatterplot3d
(3D graphics)iplots
(interactive graphics, Java-based interface)ggvis
, rCharts
(web-based, interactive graphics)shiny
(web-based R applications with a focus on graphics)The realm of the data scientist….
Some useful base functions:
aggregate()
sub()
, gsub()
, grep()
, etc. (regular expression handling)Packages:
doBy
(a good starter package, very well documented)plyr
and reshape2
dplyr
(the next generation of plyr
)data.table
(almost requisite for munging big data in R)stringr
(simplifies regex applications in R)lubridate
(simplifies date handling)sqldf
(allows one to use SQL code on R data frames)ff
(package to manage I/O of big data into R)bigmemory
and friends (big data processing in Unix)pbdR
library (large-scale parallel programming: Unix-based)An excellent IDE for R, useful for novices and developers alike. Some features:
Have given you a basic introduction to what R can do, but the important part is to learn HOW to do it…which leads to asking which topics you are interested in learning about.
Most of this group consists of R novices, so I suggest that this year's topics be devoted to learning the basics (e.g., graphics and data manipulation). This is an RFD, not a statement!
Sessions on R graphics:
Sessions on data manipulation
Which topics interest you that are not mentioned above?
By providing the foundations first, more people will be able to appreciate and apply the more interesting and trendy features of R.
Several of the topics above require some prior understanding; e.g.,
you need to be familiar with RStudio before you can appreciate shiny
.
It helps to know plyr
before diving into dplyr
, because many concepts
in dplyr
are extensions of concepts introduced in plyr
and reshape2
.