Apply

Please fill out in English

First Name*

Last Name*

Email*

Choose Program*

Academic experience in:(Which of these: Probability & Statistics, Calculus, Linear Algebra or none)

Mobile (Type your number without dashes)*

Country of residence*

utm_campaign

I agree to receive information from Israel Tech ChallengeI agree to receive information from Israel Tech Challenge

First Name

Last Name

utm_campaign

Choose Program*

Preferred Specialization

Mobile (Type your number without dashes)*

Linkedin Address (URL)*

Country of origin*

Country of residence*

Academic Institution*

Academic Degree

Do you have programming knowledge?

How did you hear of US?*

utm_campaign

Relationship Extraction & Record Linkage: Finding Relations between Companies, Pipl

February 28, 2021
, 7:17 pm
, Fellows 2020

Project by Aviv Kadair , Data Science Fellow June 2020

Abstract

Extracting a relationship between two entities: My project focused on “subsidiary”/ ”own by” relationships, and delivers the names of the named entities with the relationship. The project included distant supervision as part of the data collection, followed by a bag-of-words model to grossly extract acquisition-related paragraphs. As a final stage, I applied NER and a BERT models on the chosen paragraphs to extract the full relationship, including the corresponding entities.

Challenges (at least two)

Lack of data: As there is no freely available dataset, I scraped the web for text mentioning tech acquisitions and merges (positive examples) and utilized distant supervision to expand the positive set and create the negative set.
Dataset imbalance: adjusting the class weights to support imbalance dataset during training.
Extracting only entity-specific relationships: I applied NER process on each tagged paragraph, to ensure only paragraphs mentioning specific entities would be selected and not those talking broadly about acquisitions.

Achievements (according to KPIs)

A bag-of-words model identifying an acquisition which is mentioned in free text
A BERT model which outputs the names of the entities and the subsidiary relationship between them, if existing
Precision rates at 0.9, recall at 0.62

Future project development

Introducing coreference resolution (entity linking) to extract relationships which are further apart in a paragraph.
Improve recall – by increasing the number of available samples, and by testing for different confidence intervals (currently set on 0.85)

https://github.com/aviv-kadair/relation_extraction/tree/master

Please fill out in English

Relationship Extraction & Record Linkage: Finding Relations between Companies, Pipl

Project by Aviv Kadair , Data Science Fellow June 2020

Share this post

See more projects

Predicting and Alerting Maternal Emotional States during Pregnancy, Nuvo Cares

Feature engineering for the current Out of stock detection ML model, Trax Retail (Retail Watch team)

Points of Consumption Like You (PLU), WeissBeerger

Dataset2Vec, Explorium