Large Scale Duplicate-Detection (or Record Linkage) Package. (Multi-Core by Design)
Finding Near Duplicates with memory and time budget. Can perform record linkage over 250,000 Syrian Death Records in less than 40 sec using fast minwise hashing.
Related Papers
Blocking Methods Applied to Casualty Records from the Syrian Conflict [pdf]
Peter Sadosky, Anshumali Shrivastava, Megan Price, Rebecca C. Steorts. (2015)
Download Code
Website by: Pallavi
Contact
Anshumali at rice dot edu
3118 Duncan Hall
Rice University