Risk factor predictive value analysis for stroke events

Project by Yair


Risk scores are established predictive data-driven tools used in many hospitals as medical decision-support algorithms. We started by examining the CHADS-VASC scoring system in non-rheumatic atrial-fibrillation patients that supports the decision of initiating anticoagulation therapy for prevention of stroke. The score was designed on a cohort of about 1000 patients. We used a large medical EMR data with millions of patient visits, that include several tens of thousends patients with non-rheumatic atrial-fibrillation to validate the score’s predictive capacity. The considerable larger data-set yield a reassuring consistency with the original study and this data-set will support the development of a refined and more personalized risk algorithm.

MIMIC-III (Medical Information Mart for Intensive Care III) is a large, freely-available database comprising de-identified health-related data associated with over forty thousand patients who stayed in critical care units of the Beth Israel Deaconess Medical Center between 2001 and 2012.

The database includes information such as demographics, vital sign measurements made at the bedside (~1 data point per hour), laboratory test results, procedures, medications, caregiver notes, imaging reports, and mortality (both in and out of hospital). This project main goal is to integrate the MIMIC Dataset to a set of databases in a ready-to-use cloud environment to enable the company to use it for research and development purposes.

  • Set up and Exploration of MIMIC-III Dataset, K-Health
  • CHADS-VASC Risk Score validation & improvement, K-Health



MIMIC Project:

  • Working with very heavy data
  • Adapting SQL queries to the needs of the company environments
  • Exploration of a very dense database

Chads-Vasc Project:

  • Work with highly noisy medical data
  • Navigate the data resource of the company
  • Work in a highly multidisciplinary field
  • Select cohort population from the database of the company
  • Iterate a number of times over the population selection making sure the data is rightly selected
  • Perform academic level retrospective medical research
  • Understanding & using Statistical concepts used in medical research
  • Building a generalized pipeline and infrastructure for future risk score studies
  • Deal with heavily imbalanced data
  • Find a model that reaches a higher predictive value than the original score
  • Accept you can’t solve stroke in one month


Achievements (according to KPIs)

MIMIC Project:

  • Successfully integrated the MIMIC Dataset into the company environment

Chads-Vasc Project:

  • Successfully validated the Chads-Vasc schema on a new database
  • Successfully implemented a generalized Risk Score analysis pipeline
  • Successfully improved on the original score predictive value


Further development

Build a model to reach a better predictive value by including additional data from the patient features. Analyze the impact of these new features on the stroke outcome. Finally, try to solve the more general problem of stroke prediction by selecting a more general population and adopting a larger definition of stroke outcome (more than 1 year), and productize the model and integrate it into the application.

Share this post

Share on facebook
Share on twitter
Share on linkedin
Share on email