Apply

Please fill out in English

First Name*

Last Name*

Email*

Choose Program*

Academic experience in:(Which of these: Probability & Statistics, Calculus, Linear Algebra or none)

Mobile (Type your number without dashes)*

Country of residence*

utm_campaign

I agree to receive information from Israel Tech ChallengeI agree to receive information from Israel Tech Challenge

First Name

Last Name

utm_campaign

Choose Program*

Preferred Specialization

Mobile (Type your number without dashes)*

Linkedin Address (URL)*

Country of origin*

Country of residence*

Academic Institution*

Academic Degree

Do you have programming knowledge?

How did you hear of US?*

utm_campaign

Developing document processing algorithms to reduce operational costs and eliminate human error

Project by Elliot

April 14, 2019
, 7:52 am
, Fellows 2018

Abstract

Zeitgold has an app that is meant to replace a bookkeeper for small businesses. To process the documents that a human bookkeeper would, Zeitgold scans in documents such as payroll, invoices and receipts and extracts the necessary parts. Some of the extraction work is still done by people. Classifying financial documents to their respective categories using the documents’ content and meta data. Extracting textual information from unstructured financial documents.

The general objective of this project was to reduce the number of documents and fields that need to be extracted by people, thereby improving the scalability of the Zeitgold app.

Challenges

Lack of familiarity with the codebase. I ended up writing bits of code with the same functionality as existing code that I could have used instead. I checked in with the project mentor periodically to make sure this would happen less frequently. As the project progressed, I reused more existing functions without needing to be told of their existence.
Unexpected behaviour of an API. In a couple of cases, an OCR API that was used did not pick up characters and numbers that were present on the scanned document. I found and fixed the root cause of the issue through debugging. The algorithms written using the output of the API then worked properly.
Using functions that did not generalize to my use case. I rewrote the functions, which were meant to generalize, to include my use case.

Achievements (according to KPIs)

Going from no automation to fully automated extraction of information from payroll documents with close to 100% precision and 100% recall.
Improved the automation of certain fields from end-of-day reports, increasing recall by close to 40%.

Further development

Full automation of end-of-day reports. With the knowledge of the structure acquired through working with these reports, it would have been nice to tackle more of the fields that need to be extracted.

Please fill out in English

Developing document processing algorithms to reduce operational costs and eliminate human error

Project by Elliot

Share this post

See more projects

Predicting and Alerting Maternal Emotional States during Pregnancy, Nuvo Cares

Feature engineering for the current Out of stock detection ML model, Trax Retail (Retail Watch team)

Points of Consumption Like You (PLU), WeissBeerger

Dataset2Vec, Explorium