“Bloom Filter” to optimize expensive data lookups

Bloom Filter is very useful when you want to optimize the functionality with following constraints Data lookup is very expensive False positive is acceptable and False negative is un-acceptable A Bloom filter, conceived by Burton Howard Bloom in 1970, is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. False positive retrieval results are possible, but false negatives are not; i.e. a query returns either "inside set (may be wrong)" or "definitely not in set". – wikipedia.org It stores only value hash in it’s data structure, so it take very less memory against the very huge data. For example if you have set of raw data with 10 7 elements 10 5 distinct value and size ~40MB Bloom filter will store same data as hash in just ~0.6MB. ( ref. ) And because of the same reason it is quite fast. Bloom Filter implementation Guava library has bloom filter implementation .Net implementation