Data Science Fellows February 2021 Cohort
The project was about trying to make predictions for rental prices of properties in Dallas, as well as using SHAP to give clear advice to clients.
The focus was on gathering and preprocessing data from scratch, so that we then had something to build models on. Within the data team we scraped and augmented the data, analysed the data for trends and anomalies, processed it into usable information, performed feature and model selection, and translated the outputs into clear results
Challenges (at least two)
- Creating functions for model selection and explainability without having any data. I needed to try and create general functions so that the data could be fed into them once it had finished being processed
- Working in a team that is all interns. We were able to work with each other but it was less stable than if there was a team already in place that knew the pipeline of the business
Achievements (according to KPIs)
- Created the functions mentioned above on model selection and explainability. I documented them well and included options for parameters that meant they were adaptable depending on how the data gets processed.
- Developed counter-factual functions that implemented the explainability. It was interesting not only using what we’d learn about SHAP to explain how predictions were made, but also being able to see how the SHAP values could be used to amend features and improve the prediction
- Developed a class DataTransformer that could inverse any transformations after the counter-factual changes and show clients what improvements to make to their property.
Future project development
Deltika has a lot of obvious room for growth in its expansion into new cities and regions. All the work needs to be put into a pipeline with the front-end, so a lot of experience could be gained more on the Data Engineer side of the business