Maximally Consistent Sampling and the Jaccard Index of Probability Distributions

September 11, 2018 ยท Declared Dead ยท ๐Ÿ› 2018 IEEE International Conference on Data Mining Workshops (ICDMW)

๐Ÿ‘ป CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Ryan Moulton, Yunjiang Jiang arXiv ID 1809.04052 Category cs.DS: Data Structures & Algorithms Cross-listed cs.IR Citations 32 Venue 2018 IEEE International Conference on Data Mining Workshops (ICDMW) Last Checked 2 months ago
Abstract
We introduce simple, efficient algorithms for computing a MinHash of a probability distribution, suitable for both sparse and dense data, with equivalent running times to the state of the art for both cases. The collision probability of these algorithms is a new measure of the similarity of positive vectors which we investigate in detail. We describe the sense in which this collision probability is optimal for any Locality Sensitive Hash based on sampling. We argue that this similarity measure is more useful for probability distributions than the similarity pursued by other algorithms for weighted MinHash, and is the natural generalization of the Jaccard index.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Data Structures & Algorithms

Died the same way โ€” ๐Ÿ‘ป Ghosted