Posts

Building a choropleth map for Irish agricultural data

I wanted to create a choropleth map  using Irish data. The Irish Government has made a large quantity of data available at data.gov.ie. The set I chose was for Bovine TB across Ireland over a number of years. You can get geojson for a variety of countries and Ireland's county boundaries are available. All I had to do was get the data labels to match. In the data some counties are split into 2 parts. I merged and summed these data points to get the choropleth to work with normal county boundaries. You can see the results here . A link to the github is included at the top of the notebook.

Early Stopping with Keras

Steps to train a model There are a number of things you have to do when working on a Machine Learning problem. Import your data and transform it as appropriate Define  your model using something like Keras - this includes your chosen hyperparameter values Run the training phase Evaluate the model If at the end of this process the evaluation shows that your model is not sufficient for your needs you will need to make some changes and rerun the training.  Ideally you should have a way to spot that training is not going as you want and stop it. This is one of the uses of early stopping. The other is if your model has reached a sufficient quality standard for you to use and further improvements have slowed, stopped altogether or an unnecessary.  Early Stopping Keras provides a way of doing this using the EarlyStopping callback . There are a range of callbacks in Keras and you can define your own. Details are here . The EarlyStopping callback works by choo...

Preventing Mass Assignment Attacks in Spring

If you have a wizard to collect data from users in a set of steps you can open yourself up to a mass assignment attack. OWASP as always gives great information on this . How it works You Bind an object to the model and you want to populate the fields in that object as your user progresses through your wizard. So on the first screen you might collect their name, second screen email and so on. Between each of these steps you perform validation on the just entered values. However a hacker exploiting mass assignment can bind data from step 1 during a subsequent step just by putting the variable and value in the URL (assuming GET, although obviously this can be done with POST also). So for the first step you are after this: https://mywebsite.com/myapp/action? email =anthonynolan@somemail.com And the second step, this: https://mywebsite.com/myapp/action2? name =Anthony Instead the hacker supplies this which is fine and passes our validation: https://mywebsite.com/myapp/acti...

Encouraging people to do things with VR

I read an article a few years ago about a study to see if people contribute more to their pension fund if they have seen a future representation of themselves. Not surprisingly they do. I have been bouncing the idea around in my head, so I decided to look up the study. You can read it here . They used some clever software to produce a 3D model of a person's head which they then aged. The participant was then given a VR headset inside which they could see their older self. They could interact to a limited extent with themselves, so it was better than just a photo of 'you old'. The experiments were very well run, but basically the outcome was that people can relate to their older self if they see the image and pension contributions go up. I think this idea is generally fascinating, beyond pension contribution. For example encouraging people to adopt healthy lifestyles is very difficult. Often the pull of pleasure now far outweighs some notion of protecting your future s...

Continuous Learning for Developers

Image
Continuous learning is very important in IT. A break from learning new things for a couple of years can put you in the dark ages. I have always known this and done my best to keep upskilling, but a few things I have read recently have reminded me how important this is. Leading Change John Kotter's Leading Change is a book about managing big change in organisations. It gives a set of steps that should be followed based on the author's experience with change efforts in large numbers of big companies. The most surprising thing about the book is a section at the end where he says that the thing that unites very successful people is lifelong learning. Big Data O'Reilly publish a weekly Data Science newsletter . It is well worth a spin if statistics and analysis are your thing. I think that an interest in data is in every developer's interest. They published an article this week entitled Is your development team ready for Big Data? Again one of the key skills that...

Using Statistics to Hire Developers

The Interview I was interviewing a developer for a mid level position today and he turned out to be not up to scratch. The CV looked pretty good, but when I dug in with some fundamental Java questions he was found wanting. Not an unusual occurrence, but still costly. It took 30 minutes of my time, his time and a number of other people were involved to get the process to this stage. I wondered if there might be a way to remove some of the candidate CVs from the batch before we do the costly part of interviewing. Spam or Ham Classifying CVs could be considered like the Spam/Ham problem. Apache SpamAssassin is a good example of a spam classifier. It can probably be made to classify CVs - or at least try, but it is written in C, I think and that is not my thing any more. This looks interesting, so I forked it, converted it to eclipse and started the process of making it build with Maven. First I am going to try this out with spam and ham email, then will try it for other text p...

A Philosophy for Modern Applications

I listened to a webinar with Pivotal today and they spoke about some of the thinking behind their Cloud Foundry platform. One of the guides they use to build this is called 12 factor.net . It is a set of principles devised by the people behind the  Heroku cloud platform. I won't go into the details here, you can read the brief web page yourself, but they cover a broad range of the factors that we often don't think about when we develop large scale enterprise systems. One such concept is the avoidance of Software erosion . This is the gradual degradation of software running in a changing environment. For example as security holes in the OS are discovered your OS is eroded. A service on which you rely fails and is not restarted. If not managed continuously items like this can result in your application going out of service. The costs of getting it back may be too great to pay. A related concept is Technical debt . This is work within a system which needs to be done in orde...