Posts

Showing posts from August, 2014

Using the new pipe feature in R

Image
The magrittr package has introduced the pipe operator to R. It looks like this: %>% You use it the way you use any pipe. First operations result passed to next one and so on. Makes for more readable code instead of nested functions. To use it you need to install the magrittr package. In addition to get the nice easy to use filter,  group_by etc. functions used below you need to install the dplyr package. In fact dplyr depends on magrittr, so install it first and magrittr comes along for the ride. install.packages('dplyr') library('dplyr') You'll need the nycflights13 dataset for this example, so do this: install.packages('nycflights13') library(nycflights13) Here is the example which points the finger at airlines with longest delays from NYC: filter(flights, !is.na(dep_delay)) %>% group_by(carrier) %>%summarise(delay = mean(dep_delay)) %>% merge(airlines) %>%arrange(desc(delay)) I really like these functions like the...

Using R to analyse data from the Central Statistics Office in Ireland

I started a Coursera course this week called Getting and cleaning data . I was looking at some data from the CSO and realised that I needed to clean it up. The course is good, but quite difficult. It assumes you have not forgotten everything you learned in the previous course, R Programming.    Mid way through week 2 (I am playing a bit of catchup) I stumbled across this package http://pxr.r-forge.r-project.org/ px is the format that the CSO releases its 'raw' data in. This package puts that into a data frame which is more amenable to analysis. I reckon there has to be something lurking within the CSO dataset which is one of my main motivations for getting up to speed on R.  Nothing to declare yet, but hopefully I will find something interesting soon :)