Sunday, October 26, 2014

A tool for plotting networks - Cytoscape

The networks referred to here are more general than computer networks. Social networks for example. Cytoscape is a great free and flexible way to produce nice graphical output from 'linked node' data. On a recent project I had a requirement for this and Cytoscape did the job well. I would not describe the tool as all that easy to use, but the task it is carrying out is not easy. Once you take some time to familiarise yourself with the basic operation it works very well.

Full details are available here.

A nice feature of Cytoscape is the ability to tie a property of the data to a visual property. For example you can make the thickness of connecting lines very with a property of the relationship - weight. You could also use things like colour.

There is a large set of plugins available. Many focus on biology which as the name suggests is the typical domain for this tool, but it is not limited to that.

Cytoscape presents a steep learning curve, but this type of work was formerly the domain of experts only. Now amateurs can have a go too.

Friday, September 19, 2014

Population of Ireland on a 3d scatter plot

The Central Statistics Office in Ireland has published this dataset:

And here is the user guide:

It gives the population of each 1km square of Ireland on census night 2011. It has various buckets, but I have just looked at total population. I used the 3dscatterplot package in R which is very easy to use. 

Here is my code:

scatterplot3d(cen$CENT_EAST, cen$CENT_NORTH, cen$TOT_P)

and here is the resulting plot

You can see the big cities jumping up there. It needs a lot of polish to avoid things like Cork getting swamped in the 3d projection, but that's it for now. 

Sunday, August 31, 2014

Using the new pipe feature in R

The magrittr package has introduced the pipe operator to R. It looks like this:


You use it the way you use any pipe. First operations result passed to next one and so on.
Makes for more readable code instead of nested functions.

To use it you need to install the magrittr package. In addition to get the nice easy to use filter,  group_by etc. functions used below you need to install the dplyr package. In fact dplyr depends on magrittr, so install it first and magrittr comes along for the ride.


You'll need the nycflights13 dataset for this example, so do this:


Here is the example which points the finger at airlines with longest delays from NYC:

filter(flights, ! %>%
%>%summarise(delay = mean(dep_delay))
%>% merge(airlines)

I really like these functions like the filter one. I find subsetting the 'normal' way in R to be v tricky. This makes is a lot easier - to my mind.

Additional stuff I stumbled upon while looking at and getting the nyc data were these bits:

Find the datasets in your installed packages:

In a specific package:
data(package = 'nycflights13')

Here are the dplyr docs - 63 page monster pdf.

This is a bit more manageable.

magrittr github with nice examples is here.

Wednesday, August 13, 2014

Using R to analyse data from the Central Statistics Office in Ireland

I started a Coursera course this week called Getting and cleaning data. I was looking at some data from the CSO and realised that I needed to clean it up. The course is good, but quite difficult. It assumes you have not forgotten everything you learned in the previous course, R Programming. 
Mid way through week 2 (I am playing a bit of catchup) I stumbled across this package

px is the format that the CSO releases its 'raw' data in. This package puts that into a data frame which is more amenable to analysis. I reckon there has to be something lurking within the CSO dataset which is one of my main motivations for getting up to speed on R. 

Nothing to declare yet, but hopefully I will find something interesting soon :)

Monday, September 16, 2013

Get a list of your functions in mysql

If you want a list of functions (as opposed to procedures) use this:

select * from information_schema.routines where routine_schema = 'your_schema_name' and routine_type != 'PROCEDURE'

Friday, August 9, 2013

Douglas Crockford Video 5 - 'The end of all things'

No let up in quality in this video. Here are my notes for the final video in the original series:

Cross site scripting (XSS) is a big problem. Huge privs accorded to a successful attacker.
Caja and adsafe - make js safer.

Don't confuse a variable and a value.

How does an object get a reference:
By Creation
By Construction
By Reference

David Parnas:

Lazy programmers guide

Keep performance delays below 100ms - provide some sort of immediate feedback.
Don’t fiddle with code. Measure first. Use PageSpeed

Arrays can be slow in older versions of ie. No hashmaps.
Don’t add unnecessary chrome. Takes time.

Don’t tune for quirks. Keep code clean and readable. Future versions of JS engines will be much faster. Your quirk optimisations may cause trouble.

Avoid global variables.

Avoid ++ - too easy to mess up
Use jslint

Wednesday, August 7, 2013

Douglas Crockford's JavaScript Video 4 - AJAX

These videos are remarkable in the packed world of IT training videos in that they are clear and enjoyable to watch. The fourth instalment is about Ajax, but goes into plenty of detail that I didn't know about where the DOM came from and some info about the famous browser wars. Here are my notes so that you can see what is in there before you invest 90 minutes.

Markup languages

GML - generalized markup language
HTML - simplified SGML

Angle brackets came from Scribe.


Does not fail on errors - allowed innovation. Otherwise the web would have frozen.
2 types of outlines - H1 - not nested and p type which are. Yuk.


Not modular - clashes can wreck your page.
difficult to manage selectors - classitis and iditis.
None of the browser vendors ever got it implemented!


Brendan Eich - Netscape

Browser workflow

url -> Fetch -> cache  -> Parse -> Tree ->  Flow -> display list ->  Paint -> pixels

Comments around script tags just protects users of ancient browsers from seeing the script. Don't bother with this.


Very bad. Don't do it.

For performance improvement of scripts

  • minify
  • gzip
  • Reduce number of script files (concat at deploy)
  • Use something like Chrome PageSpeed to test

Javascript uses camel case for style properties. CSS uses hyphens - incompatible with JS - in fact incompatible with most languages. Done on purpose. Source of annoying bugs.


Nice and fast, but dangerous. Developed by ms - all browsers support it.

Always err on the side of understanding and clean code over performance unless performance is a serious problem.


Bubble up through the DOM - use stopPropagation to deal with this.
Allows attaching of a single event handler to a container. The container then dispatches the event to the appropriate element. Faster to set up.

Use good speed testing tools - Chrome best.

Server vs browser

Neither side should dominate. A balance is the best. The server is not a filesystem and the browser is not a dope that just displays returned content.