Michael Goncharov
Data Science Fellows June 2020 Cohort
Abstract
The project was focused on improving the models used in the process of real estate property price valuation.
End-to-end Machine Learning pipeline based on demographic data with data exploration and preparation, feature engineering, model training and deploying on AWS cloud.
Challenges (at least two)
- Data cleaning and feature exploration
- AWS lambda and step functions
Achievements (according to KPIs)
- Found significant catch/extracted new features such as “average price for 1 square feet by ZIP code by year”
- Achieved R2 score 0.8 with only 25 demographic features
- New powerful features have been designed and taken into account in subsequent work in the company.
- Trained a powerful model with small amount of features
Future project development
As the valuation of real estate depends on macroeconomic indicators, the next step is to include them in the model. It will significantly improve the model.