Research

FLASH: A parallel k-NN system for ultra-high dimensional datasets.

We present FLASH (Fast LSH Algorithm for Similarity search accelerated with HPC), a similarity search system for ultra-high dimensional datasets on a single machine. Current state-of-the-art implementations either fail on the presented scale or are orders of magnitude slower than our system. FLASH is capable of computing an approximate k-NN graph, from scratch, over full webspam dataset (1.3 billion nonzeros) in less than 10 seconds.

Camera-Based Device Positioning without need of cloud or GPS

We show the first camera based (privacy-preserving) indoor mobile positioning system, CaPSuLe, which does not involve any communication (or data transfer) with any other device or the cloud. The algorithm only needs 78.9MB of memory and can localize a mobile device with 92.11% accuracy.

Scalable and Sustainable Deep Learning

We present a novel technique to reduce the amount of computation needed to train and test deep net-works drastically. Our approach combines recent ideas from adaptive dropouts and randomized hashing for maximum inner product search to select only the nodes with the highest activation efficiently. Our algorithm only requires 5% of computations (multiplications) compared to traditional algorithms, without any loss in the accuracy and 95% reduction in computational cost.

High Speed Big-Data Streams Mining (for IoT devices)

Ultra-High Speed and Ultra-Low Memory (Logarithmic) data mining algorithms ideal for low latency, low power and low memory scenarios such as for IoT (Internet of Things)

Extreme Scale Time Series Mining

We design SSH (Sketch, Shingle, & Hashing), an efficient and approximate hashing scheme which is much faster than the state-of-the-art branch and bound searching technique: the UCR suite. Our results show that SSH is very effective for longer time sequence and prunes around 95% candidates, leading to the massive speedup in search with DTW.

Fastest Minwise Hashing

Traditional Minwise Hashing is slow. We show single pass and constant time minwiseh hashing algorithm which can be thousands of time faster than the fastest MinHash code. We use densification for fast minwise hashing over sparse data.

Massive Scale Search Duplicate Detection and Record Linkage: Linking Death Records From Syrian Conflict.

Finding Near Duplicates with memory and time budget. Can perform record linkage over 250,000 Syrian Death Records in less than 40 sec using fast minwise hashing.

Latency Critical Privacy-Preserving Matching and Search

Large-Scale Collaborative Filtering