The project’s goal was research oriented. We had to search for the latest state-of-the-art deep neural networks in the field of visual odometry and SLAM. The objective is, given two photos of the same scene taken from a different point of view, what is the camera pose transformation between them.
- Getting relevant data for visual odometry for cars (diversity of the tracks, weather conditions etc.)
- Getting enough data to train
- Find the appropriate architecture among a dozen of different academic and industrial research papers
- Implementation and optimization of networks
- Training huge datasets of images overnight and store them remotely
Achievements (according to KPIs)
- Built a photo-realistic dataset from the GTA video game, composed of 670,000 pair images and their camera transformation
- Implemented in keras a neural network from the RPNet [Sovann & al] paper with a loss function from the PoseNet2 [Kendall & al]
- Get real time runtime with the use of a MobileNet siamese core model
- Achieved convergence of the model for the dataset, but still overfitting
- Average translation error is less than 40cm
Make the network converge for all the possible movement of the camera (so far we were limited to realistic car movement).
Match images with a greater distance, angle rotation, and with different weather conditions (so far, although the data where diverse, we limited the network training to 10 degrees of rotation maximum, 4 meters translation, and we kept the same weather condition inside pairs).
Improve the accuracy in the testing dataset and get below 30 cm.
Improve the accuracy in rotation prediction of the camera.