Rozanna Royter
Data Science Fellows June 2020 Cohort
Abstract
Project was focused on real estate property price prediction based on huge amounts of data gathered about a property from multiple data providers. The purpose of these prediction models is to assess and minimize the risks associated with the underwriting process of the mortgage loan.
Challenges
- Future leakage for tax assessed price – for some properties we have the target feature (last sale price) from a year that is way earlier than the tax assessed price we get from data providers.
- The features with future leakage have a significantly higher feature importance score compared to the rest.
- Too many feature columns – hard to choose only a few relevant ones without decreasing the score.
Achievements
- Feature extraction ideas for future leakage – get from the source tax assessed price for all available years and use only the relevant year (the one we have a sale price for)
- Tried multiple regression models and used gridsearch to tune hyperparameters
- Model deployment on AWS lambda
Future project development
Building model with relevant tax assessor data (for the relevant year) and get more data from other data sources.