Real Estate Property Value Prediction, LendAI

Rozanna Royter

Data Science Fellows June 2020 Cohort


Project was focused on real estate property price prediction based on huge amounts of data gathered about a property from multiple data providers. The purpose of these prediction models is to assess and minimize the risks associated with the underwriting process of the mortgage loan. 


  1. Future leakage for tax assessed price – for some properties we have the target feature (last sale price) from a year that is way earlier than the tax assessed price we get from data providers.
  2. The features with future leakage have a significantly higher feature importance score compared to the rest.
  3. Too many feature columns – hard to choose only a few relevant ones without decreasing the score. 


  1. Feature extraction ideas for future leakage – get from the source tax assessed price for all available years and use only the relevant year (the one we have a sale price for)
  2. Tried multiple regression models and used gridsearch to tune hyperparameters
  3. Model deployment on AWS lambda 


Future project development 

Building model with relevant tax assessor data (for the relevant year) and get more data from other data sources. 

Share this post

Share on facebook
Share on twitter
Share on linkedin
Share on email