Posts

Showing posts from 2014

A tool for plotting networks - Cytoscape

The networks referred to here are more general than computer networks. Social networks for example. Cytoscape is a great free and flexible way to produce nice graphical output from 'linked node' data. On a recent project I had a requirement for this and Cytoscape did the job well. I would not describe the tool as all that easy to use, but the task it is carrying out is not easy. Once you take some time to familiarise yourself with the basic operation it works very well. Full details are available here. A nice feature of Cytoscape is the ability to tie a property of the data to a visual property. For example you can make the thickness of connecting lines very with a property of the relationship - weight. You could also use things like colour. There is a large set of plugins available. Many focus on biology which as the name suggests is the typical domain for this tool, but it is not limited to that. Cytoscape presents a steep learning curve, but this type of work was f

Population of Ireland on a 3d scatter plot

Image
The Central Statistics Office in Ireland has published this dataset: http://www.cso.ie/en/census/census2011griddataset/ And here is the user guide: http://www.cso.ie/en/media/csoie/census/documents/census2011griddataset/1,Km%C2%B2,Grid,dataset,User,Guide,2011.pdf It gives the population of each 1km square of Ireland on census night 2011. It has various buckets, but I have just looked at total population. I used the 3dscatterplot package in R which is very easy to use.  Here is my code: cen<-read.csv("COP2011_Grid_ITM_IE_1Km.csv") install.packages('scatterplot3d') library(scatterplot3d) scatterplot3d(cen$CENT_EAST, cen$CENT_NORTH, cen$TOT_P) and here is the resulting plot You can see the big cities jumping up there. It needs a lot of polish to avoid things like Cork getting swamped in the 3d projection, but that's it for now. 

Using the new pipe feature in R

Image
The magrittr package has introduced the pipe operator to R. It looks like this: %>% You use it the way you use any pipe. First operations result passed to next one and so on. Makes for more readable code instead of nested functions. To use it you need to install the magrittr package. In addition to get the nice easy to use filter,  group_by etc. functions used below you need to install the dplyr package. In fact dplyr depends on magrittr, so install it first and magrittr comes along for the ride. install.packages('dplyr') library('dplyr') You'll need the nycflights13 dataset for this example, so do this: install.packages('nycflights13') library(nycflights13) Here is the example which points the finger at airlines with longest delays from NYC: filter(flights, !is.na(dep_delay)) %>% group_by(carrier) %>%summarise(delay = mean(dep_delay)) %>% merge(airlines) %>%arrange(desc(delay)) I really like these functions like the

Using R to analyse data from the Central Statistics Office in Ireland

I started a Coursera course this week called Getting and cleaning data . I was looking at some data from the CSO and realised that I needed to clean it up. The course is good, but quite difficult. It assumes you have not forgotten everything you learned in the previous course, R Programming.    Mid way through week 2 (I am playing a bit of catchup) I stumbled across this package http://pxr.r-forge.r-project.org/ px is the format that the CSO releases its 'raw' data in. This package puts that into a data frame which is more amenable to analysis. I reckon there has to be something lurking within the CSO dataset which is one of my main motivations for getting up to speed on R.  Nothing to declare yet, but hopefully I will find something interesting soon :)