Posts

Building a choropleth map for Irish agricultural data

I wanted to create a choropleth map  using Irish data. The Irish Government has made a large quantity of data available at data.gov.ie. The set I chose was for Bovine TB across Ireland over a number of years. You can get geojson for a variety of countries and Ireland's county boundaries are available. All I had to do was get the data labels to match. In the data some counties are split into 2 parts. I merged and summed these data points to get the choropleth to work with normal county boundaries. You can see the results here . A link to the github is included at the top of the notebook.

Early Stopping with Keras

Steps to train a model There are a number of things you have to do when working on a Machine Learning problem. Import your data and transform it as appropriate Define  your model using something like Keras - this includes your chosen hyperparameter values Run the training phase Evaluate the model If at the end of this process the evaluation shows that your model is not sufficient for your needs you will need to make some changes and rerun the training.  Ideally you should have a way to spot that training is not going as you want and stop it. This is one of the uses of early stopping. The other is if your model has reached a sufficient quality standard for you to use and further improvements have slowed, stopped altogether or an unnecessary.  Early Stopping Keras provides a way of doing this using the EarlyStopping callback . There are a range of callbacks in Keras and you can define your own. Details are here . The EarlyStopping callback works by choo...

Preventing Mass Assignment Attacks in Spring

If you have a wizard to collect data from users in a set of steps you can open yourself up to a mass assignment attack. OWASP as always gives great information on this . How it works You Bind an object to the model and you want to populate the fields in that object as your user progresses through your wizard. So on the first screen you might collect their name, second screen email and so on. Between each of these steps you perform validation on the just entered values. However a hacker exploiting mass assignment can bind data from step 1 during a subsequent step just by putting the variable and value in the URL (assuming GET, although obviously this can be done with POST also). So for the first step you are after this: https://mywebsite.com/myapp/action? email =anthonynolan@somemail.com And the second step, this: https://mywebsite.com/myapp/action2? name =Anthony Instead the hacker supplies this which is fine and passes our validation: https://mywebsite.com/myapp/acti...

Encouraging people to do things with VR

I read an article a few years ago about a study to see if people contribute more to their pension fund if they have seen a future representation of themselves. Not surprisingly they do. I have been bouncing the idea around in my head, so I decided to look up the study. You can read it here . They used some clever software to produce a 3D model of a person's head which they then aged. The participant was then given a VR headset inside which they could see their older self. They could interact to a limited extent with themselves, so it was better than just a photo of 'you old'. The experiments were very well run, but basically the outcome was that people can relate to their older self if they see the image and pension contributions go up. I think this idea is generally fascinating, beyond pension contribution. For example encouraging people to adopt healthy lifestyles is very difficult. Often the pull of pleasure now far outweighs some notion of protecting your future s...

Continuous Learning for Developers

Image
Continuous learning is very important in IT. A break from learning new things for a couple of years can put you in the dark ages. I have always known this and done my best to keep upskilling, but a few things I have read recently have reminded me how important this is. Leading Change John Kotter's Leading Change is a book about managing big change in organisations. It gives a set of steps that should be followed based on the author's experience with change efforts in large numbers of big companies. The most surprising thing about the book is a section at the end where he says that the thing that unites very successful people is lifelong learning. Big Data O'Reilly publish a weekly Data Science newsletter . It is well worth a spin if statistics and analysis are your thing. I think that an interest in data is in every developer's interest. They published an article this week entitled Is your development team ready for Big Data? Again one of the key skills that...

Using Statistics to Hire Developers

The Interview I was interviewing a developer for a mid level position today and he turned out to be not up to scratch. The CV looked pretty good, but when I dug in with some fundamental Java questions he was found wanting. Not an unusual occurrence, but still costly. It took 30 minutes of my time, his time and a number of other people were involved to get the process to this stage. I wondered if there might be a way to remove some of the candidate CVs from the batch before we do the costly part of interviewing. Spam or Ham Classifying CVs could be considered like the Spam/Ham problem. Apache SpamAssassin is a good example of a spam classifier. It can probably be made to classify CVs - or at least try, but it is written in C, I think and that is not my thing any more. This looks interesting, so I forked it, converted it to eclipse and started the process of making it build with Maven. First I am going to try this out with spam and ham email, then will try it for other text p...

A Philosophy for Modern Applications

I listened to a webinar with Pivotal today and they spoke about some of the thinking behind their Cloud Foundry platform. One of the guides they use to build this is called 12 factor.net . It is a set of principles devised by the people behind the  Heroku cloud platform. I won't go into the details here, you can read the brief web page yourself, but they cover a broad range of the factors that we often don't think about when we develop large scale enterprise systems. One such concept is the avoidance of Software erosion . This is the gradual degradation of software running in a changing environment. For example as security holes in the OS are discovered your OS is eroded. A service on which you rely fails and is not restarted. If not managed continuously items like this can result in your application going out of service. The costs of getting it back may be too great to pay. A related concept is Technical debt . This is work within a system which needs to be done in orde...

Tips for passing Oracle Certified Associate in Java 8

I decided to do this exam a few months ago. I have been a Java developer for over 10 years, so this was overdue. There are a few simple steps and quite a lot of work. First get a book I used this one  on Kindle. Kindle is the perfect format for this stuff as you can read snippets while you wait at the bus, checkout etc. Read it through cover to cover. I did not do the exercises on this pass, I was just making sure that I could understand almost everything in there. Next you should either do the end of chapter exercises, or write small programs to test your understanding. Don't start taking mock exams too early. Mock exams These are absolutely required to test your knowledge. I used these ones . I (like most of the people whom I spoke to about this) found I did not know the material as well as I thought. The huge plus for the Enthuware mock exams is that they explain in detail why you were wrong. Take your time understanding why you made the mistakes you did. Don't ru...

A tool for plotting networks - Cytoscape

The networks referred to here are more general than computer networks. Social networks for example. Cytoscape is a great free and flexible way to produce nice graphical output from 'linked node' data. On a recent project I had a requirement for this and Cytoscape did the job well. I would not describe the tool as all that easy to use, but the task it is carrying out is not easy. Once you take some time to familiarise yourself with the basic operation it works very well. Full details are available here. A nice feature of Cytoscape is the ability to tie a property of the data to a visual property. For example you can make the thickness of connecting lines very with a property of the relationship - weight. You could also use things like colour. There is a large set of plugins available. Many focus on biology which as the name suggests is the typical domain for this tool, but it is not limited to that. Cytoscape presents a steep learning curve, but this type of work was f...

Population of Ireland on a 3d scatter plot

Image
The Central Statistics Office in Ireland has published this dataset: http://www.cso.ie/en/census/census2011griddataset/ And here is the user guide: http://www.cso.ie/en/media/csoie/census/documents/census2011griddataset/1,Km%C2%B2,Grid,dataset,User,Guide,2011.pdf It gives the population of each 1km square of Ireland on census night 2011. It has various buckets, but I have just looked at total population. I used the 3dscatterplot package in R which is very easy to use.  Here is my code: cen<-read.csv("COP2011_Grid_ITM_IE_1Km.csv") install.packages('scatterplot3d') library(scatterplot3d) scatterplot3d(cen$CENT_EAST, cen$CENT_NORTH, cen$TOT_P) and here is the resulting plot You can see the big cities jumping up there. It needs a lot of polish to avoid things like Cork getting swamped in the 3d projection, but that's it for now. 

Using the new pipe feature in R

Image
The magrittr package has introduced the pipe operator to R. It looks like this: %>% You use it the way you use any pipe. First operations result passed to next one and so on. Makes for more readable code instead of nested functions. To use it you need to install the magrittr package. In addition to get the nice easy to use filter,  group_by etc. functions used below you need to install the dplyr package. In fact dplyr depends on magrittr, so install it first and magrittr comes along for the ride. install.packages('dplyr') library('dplyr') You'll need the nycflights13 dataset for this example, so do this: install.packages('nycflights13') library(nycflights13) Here is the example which points the finger at airlines with longest delays from NYC: filter(flights, !is.na(dep_delay)) %>% group_by(carrier) %>%summarise(delay = mean(dep_delay)) %>% merge(airlines) %>%arrange(desc(delay)) I really like these functions like the...

Using R to analyse data from the Central Statistics Office in Ireland

I started a Coursera course this week called Getting and cleaning data . I was looking at some data from the CSO and realised that I needed to clean it up. The course is good, but quite difficult. It assumes you have not forgotten everything you learned in the previous course, R Programming.    Mid way through week 2 (I am playing a bit of catchup) I stumbled across this package http://pxr.r-forge.r-project.org/ px is the format that the CSO releases its 'raw' data in. This package puts that into a data frame which is more amenable to analysis. I reckon there has to be something lurking within the CSO dataset which is one of my main motivations for getting up to speed on R.  Nothing to declare yet, but hopefully I will find something interesting soon :)

Get a list of your functions in mysql

If you want a list of functions (as opposed to procedures) use this: select *  from information_schema.routines  where routine_schema = 'your_schema_name'  and routine_type != 'PROCEDURE'

Douglas Crockford Video 5 - 'The end of all things'

No let up in quality in this video. Here are my notes for the final video in the original series: Cross site scripting (XSS) is a big problem. Huge privs accorded to a successful attacker. Caja and adsafe - make js safer. Don't confuse a variable and a value. How does an object get a reference: By Creation By Construction By Reference David Parnas: http://www.cs.umd.edu/class/spring2003/cmsc838p/Design/criteria.pdf Lazy programmers guide http://www.youtube.com/watch?v=eL5o4PFuxTY Keep performance delays below 100ms - provide some sort of immediate feedback. Don’t fiddle with code. Measure first. Use PageSpeed Arrays can be slow in older versions of ie. No hashmaps. Don’t add unnecessary chrome. Takes time. Don’t tune for quirks. Keep code clean and readable. Future versions of JS engines will be much faster. Your quirk optimisations may cause trouble. jslint.com Avoid global variables. Avoid ++ - too easy to mess up Use jsl...

Douglas Crockford's JavaScript Video 4 - AJAX

These videos are remarkable in the packed world of IT training videos in that they are clear and enjoyable to watch. The fourth instalment is about Ajax, but goes into plenty of detail that I didn't know about where the DOM came from and some info about the famous browser wars. Here are my notes so that you can see what is in there before you invest 90 minutes. Markup languages RUNOFF GML - generalized markup language SGML HTML - simplified SGML Latex Angle brackets came from Scribe . HTML Does not fail on errors - allowed innovation. Otherwise the web would have frozen. 2 types of outlines - H1 - not nested and p type which are. Yuk. CSS Not modular - clashes can wreck your page. difficult to manage selectors - classitis and iditis. None of the browser vendors ever got it implemented! The DOM Brendan Eich - Netscape Browser workflow url -> Fetch -> cache  -> Parse -> Tree ->  Flow -> display lis...

Douglas Crockford's JavaScript Videos from his time at Yahoo

He has now moved on to other things, but these videos are still around. They are like reading a novel - long form is still best. No sound bites here. Each video is over an hour long and there are 8 of them. I am only on the third at the moment, but am getting a huge amount out of them. If you are like most js developers and me you will be bludgeoning your way through whatever tasks you need to complete without knowing the details. JQuery et al insulate us from having to know this stuff right? Afraid not. There is no substitute for knowing the javascript in detail. These videos manage to do that.

Useful CSS Wisdom

I came across this in the most recent Web Design Weekly. I have called it wisdom as it is full of the type of advice that people only have the balls to give you when they really know what they are talking about. Why IDs in CSS should be avoided is detailed. A lot of people rant about this, but there is something about this document that got me sifting through my CSS to find all the # marks. The document is short and I am going to look back at it a few times over the coming months to make sure I have squeezed the benefit out of it.

Web Design Weekly

Web Design Weekly is a great resource for front end developers and wannabes. Once you subscribe it arrives in your inbox once a week with a selection of articles, news, tools etc. that you can use in your work. It is just like subscribing to RSS but I find that getting news on a specific area like this once a week means that I set aside some time to at least browse it and find out what is interesting. The material is often of a high quality and where I don't manage to get stuff working it's probably down to me not persisting with it long enough. If this is the type of work that you do I recommend signing up for this. The link is at the start of this post.

Mozilla Developer Network JavaScript Resources

When I am looking for anything to do with js, html or css on the web I tend to put the letters mdn into the search line along with whatever else I am looking for. This usually pushes me to the pages on the Mozilla Developer Network. They are consistently of high quality - unlike for example W3Schools  and are clearly written by the type of people who possess the knowledge that I want. Mostly it is just a dip in to find our a syntax here and there, but they do offer comprehensive book style start to finish information. The JavaScript Guide for example is here . I have been programming in JavaScript for quite a while now, so I am not reading this line for line, but a scan through it is clearing up some long held confusions for me. I reckon getting proper skills into your head and hands is a slow process and the way there is to persist and add new pieces every day. Going through the MDN docs is an example of this.

Book about Design Patterns in JavaScript

This book by Addy Osmani goes into depth about the various design patterns that you can use in your JavaScript. The main advantage that I am getting out of it is going through the examples and reading the references to see how to do 'complex' JavaScript. Singletons always come in handy in Java, now I can do them in JS. A host of other patterns are covered. I will be taking the time to go through this book in more detail over the coming weeks.