Blog

Applying machine learning to diagnose diseases

machine learning course

Istead of frying in the intense summer heat of Berlin last weekend, 8 of us from the Ubiqum Code Academy Data Analytics & Machine Learning program in Berlin decided to attend a ‘Data Science Sleepover’ where we’d be applying machine learning to diagnose diseases.

The event was organised by AIScope, a non-profit that aims to apply machine learning to diagnose diseases such as malaria and tuberculosis in even the most isolated, hard-to-reach places in the world. AIScope’s CEO, Eduardo Peire, obtained datasets containing information on dengue and malaria records from the Peruvian Amazon. The institutions that had collected the data from the Amazon did not have the capacity to analyse the data themselves, so Eduardo and the rest of the AIScope team organised the Data Science Sleepover to bring together passionate data scientists in Berlin and investigate the data together.

machine learning course

While no one actually slept over at the ‘Sleepover,’ 25 of us basically spent most of Friday evening to Sunday afternoon together analysing the data, returning home only to sleep. On Friday, we were introduced to AIScope and the significance of the project where we got to understand more about malaria, dengue, and how the data was collected. We were then divided into groups of four based on our coding language preferences (one person from Ubiqum accidentally found herself in a Python group, not sure what happened there…). We spent the rest of the evening getting to know one another over burritos and beers.

When we arrived back at the venue on Saturday morning, we had a quick introduction to design thinking principles and then it was time to work, work, work. Data exploration and visualising is my favourite part of data analytics, so I decided to spend all day doing that. Other members of my team also followed a similar path. Though we were all at different skill levels, everyone had something to contribute.

During this time (from 9 am to 9 pm!), I made some visualisations that helped me understand the following:

  • The data quality was not very good — there were only a couple of years in which it seemed like the data collectors were consistently collecting information.
  • There were more men in the hospital records — perhaps because men in that region are more likely than women to go to the hospital?
  • The peak month for malaria hospital records was July, right after the rainy season — This is probably because mosquitoes lay their larvae in still water during the rainy season and then these larva become mosquitoes and go infect people right after the rainy season.

On Sunday, it was time to bring together what everybody had done into a single narrative. After munching on some banana pancakes for breakfast, we all discussed what we had done the previous day. One person had mapped out the locations where dengue-positive larvae were found in the study area and was trying to understand if there were any factors correlated with finding dengue-positive larvae in any particular location. Another person had built a model to predict whether a person had dengue or malaria based on their age, sex, and location.

We divided ourselves into those who had done temporal visualisations, geographical visualisations, and predictive modeling, and went about making our slides. Afterwards, we presented this and representatives from Pfizer, Bayer, Dataconomy, and Hella even came in to check out what we had found from the dataset.