Large Scale Duplicate-Detection (or Record Linkage) Package. (Multi-Core by Design)

Finding Near Duplicates with memory and time budget. Can perform record linkage over 250,000 Syrian Death Records in less than 40 sec using fast minwise hashing.

Related Papers

 

Blocking Methods Applied to Casualty Records from the Syrian Conflict [pdf]

Peter Sadosky, Anshumali Shrivastava, Megan Price, Rebecca C. Steorts. (2015)

Download Code

Website by: Pallavi

Contact

Anshumali at rice dot edu

3118 Duncan Hall

Rice University