The R Language

With the implementation of the functional idiom in languages such as Haskell, Scala, Java, Python, Ruby, and C#, is the R language relevant? One thing to note about function languages is that they do not support mutations.

Java 8 finally gave Java developers a taste of functional programming but the feature is just an add on to the language.  This was done, I think, to maintain backwards compatibility with the millions of Java applications in the field.

The R Language versus Statically Typed Languages

One thing about the languages I listed, is that they most are strongly typed. Python and Ruby feature duck typing that does not allow the datatype to change once a value is assigned to a variable. That’s not as safe as statically typed variables but it is safer than dynamically typed variables.

So why would anyone want to use a language like R that is dynamically typed?

Productivity is one feature that comes to mind. I can input a data set, cleanse and refactor the data, and then create a graph using R’s ggpplot2 library faster that I can do that in statically typed languages.

Another nice feature of R is its interactive nature. Scala has a REPL but as of the writing of this article Java 8 still lacks a REPL. R studio features a split screen with source code on the top. The bottom features a command line window that allows the developer to try out code snippets and then incorporate the new code into the source code window.

R is one of the de facto programming languages for data science. The other is Python. Is R a better choice than Python? I think it depends on your mind set. Python looks a lot more like a standard language such as Java or C#. Python eliminates the myriad of curly braces in Java and other languages.

Some drawbacks to Python is the control of the language that resides with a single person. Another negative were the changes in version 3 of Python that were not compatible with Python 2. Try doing that with Java and you would start a world wide language revolt.

As the capabilities of Python have improved over the years, Python has started to gain market share from R.  R is still very popular with people doing analytics.  

Plotting Graphs with ggplot2

A package called ggplot2 bring visualization capabilities to R.  I did not have any problems getting my charts to work. They were not perfect but a few minutes spent fixing issues and I was done.

As you can see from the chart below there are several insights gained by performing this data analysis. The chart depicts the survival rates of passengers on the Titanic.

First, the survival rate of the passengers is closely correlated to the passenger class. That big red rectangle on the right shows that the people with the economy class tickets were much more likely to perish. Second, the type of passenger cabin also shows that passengers in the U-cabins had a lower survival rate than those in the other cabin types.

R Charting Versus Tableau Charting

R’s charting capability is more difficult to use than that feature in Tableau. Have a look at the combination chart below. The chart shows what percentage of customers are leaving a financial institution by age bracket. With Tableau, I created the chart without writing any code.

Click on Image to see Full Visualization

The Tableau chart is interactive. Click on the image to go to my public tableau site. Overall, Tableau’s chart looks better delivers the interactive functionality that ggplot2 is missing. Ggplot2 does have a big advantage in terms of price, it’s free. Tableau costs money and the price of Tableau server edition is quite substantial. Startup companies are going to favor R or Microsoft BI for that reason.

R Language Learning Resources

David Langer has an excellent introduction to the R language on YouTube. The intro has three videos that total around three hours. If you decide to actually install R Studio and code the exercises, you can complete the course in about ten hours.

I have a programming background, so learning R was relatively easy. The way the data frame object allows the manipulation of data is powerful.  R has a functional feel even though R supports mutations. R is supposed to be more difficult than Python but learning the language felt natural because R builds data in a way that reminds me of Scala. In Scala, to avoid mutations, you just combine data together to create new data. That’s how R feels to me.

Conclusion

Although R is losing market share relative to Python in the data science arena, I think R is still growing.

I think the real challenges are for Tableau and SAS both of which face challenges from Microsoft (Power BI) and open source software like R and Python.