Skip to main content

“Bloom Filter” to optimize expensive data lookups


Bloom Filter is very useful when you want to optimize the functionality with following constraints
  1. Data lookup is very expensive
  2. False positive is acceptable and
  3. False negative is un-acceptable

A Bloom filter, conceived by Burton Howard Bloom in 1970, is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. False positive retrieval results are possible, but false negatives are not; i.e. a query returns either "inside set (may be wrong)" or "definitely not in set".wikipedia.org
 360px-Bloom_filter_speed.svg

It stores only value hash in it’s data structure, so it take very less memory against the very huge data.
For example if you have set of raw data with
  • 107 elements
  • 105 distinct value
  • and size ~40MB
Bloom filter will store same data as hash in just ~0.6MB. (ref.)
And because of the same reason it is quite fast.

Bloom Filter implementation
Guava library has bloom filter implementation
.Net implementation

Bloom Filter In Use
  1. Chrome uses Bloom filters to make a preliminary decision whether a particular web site is malicious or safe.
  2. Google Big table uses it to reduce disk lookups
  3. Squid Web Proxy Cache uses Bloom filters for cache digests
  4. Cassandra uses to save IO when performing a key lookup

Useful links
Bloom filter by example
Space-Efficient Bloom Filters
"More Optimal Bloom Filters," Ely Porat (Nov/2007) Google TechTalk video on YouTube



Comments

Popular posts from this blog

ERROR: Ignored call to 'alert()'. The document is sandboxed, and the 'allow-modals' keyword is not set.

Recently I found this issue while writing code snippet in "JSFiddle". And after searching, found this was happening because of new feature added in "Chrome 46+". But at the same time Chrome doesn't have support for "allow-modals" property in "sandbox" attribute.

Chromium issue for above behavior:
https://codereview.chromium.org/1126253007

To make it work you have to add "allow-scripts allow-modals" in "sandbox" attribute, and use "window.alert" instead of "alert".



<!-- Sandbox frame will execute javascript and show modal dialogs --> <iframe sandbox="allow-scripts allow-modals" src="iframe.html"> </iframe>


Feature added: Block modal dialog inside a sandboxed iframe.
Link: https://www.chromestatus.com/feature/4747009953103872

Feature working Demo page:
https://googlechrome.github.io/samples/block-modal-dialogs-sandboxed-iframe/index.html



JavaScript [ExtJs3]: EditorGridPanel Read-Only (dynamically)

Many time we face the scenerio where we have to make the editor grid read-only dynamically.


Ext.override(Ext.ux.grid.CheckColumn, { editable: true, onMouseDown: function (e, t) { if (Ext.fly(t).hasClass(this.createId())) { e.stopEvent(); var me = this, grid = me.grid, view = grid.getView(), index = view.findRowIndex(t), colindex = view.findCellIndex(t), record = grid.store.getAt(index); if (!grid.isReadOnly && grid.colModel.isCellEditable(colindex, index)) { record.set(me.dataIndex, !record.data[me.dataIndex]); } } } }); var grid = new Ext.grid.EditorGridPanel({ ... isReadOnly: true, //set to flag to make check column readonly ... }); //to make other column readonly grid.on('beforeedit', function () { return false; });

JavaScript [ExtJs3]: Total “Record” count in filtered store

There is two way to get record count from the Store
store.getTotalCount() This function depend on server response value. For accuracy of the value, property shell if return by the server.

Property name for the diff. reader:
totalProperty for JsonReader, totalRecords for XmlReaderstore.getCount() Will return you the number of record from the store.
Or if you have filter on the store, it will give you the number of filtered record.
But if you want to get the total number of record regardless filtering, Then it will be like this

var totalRecords = store.snapshot ? store.snapshot.length : store.getCount();
“snapshot” is the variable in “Store” which hold the actual data in case if you have applied a filter.