They knew that 1,904 of the patients for whom they had anonymised data had developed type-2 diabetes and looked for common characteristics among this group, such as high body weight, a high waist circumference, and being older—all known risk factors.
Type-2 diabetes occurs when the pancreas stops making enough insulin or a person becomes resistant to insulin’s effects, allowing glucose levels to build up in the bloodstream.
Binh and Colin used a data set from a 2015 competition set by Kaggle, which describes itself as “the world’s largest data-science community”.
“Kaggle offers data-science competitions and it is also a public data platform, a cloud-based workbench for data science,” says Binh.
“Many companies post their data science challenges on Kaggle as competitions with monetary prizes so Kaggle users can participate in them.”
Binh says that while there is still work to be done in refining the model, the results are encouraging.
Once the researchers are happy with the accuracy of the algorithm, the next step will be to create a tool for healthcare professionals and policymakers to use.
“The goal is to give healthcare professionals a tool to initiate interventions or modify treatment plans, as well as better understand the progression of the disease.”
“The goal is to give healthcare professionals a tool to initiate interventions or modify treatment plans, as well as better understand the progression of the disease,” says Colin.
“It will be a key driver in delivering a personalised approach to medicine that will lead to improved healthcare quality.”
Binh and Colin are also working on ways to apply their use of public health data sets and machine learning to other diseases.
This includes a study similar to their type-2 diabetes work to predict acute kidney injury.