Skip to main content

"Stop Words" in full text search

Stop words Stop words are the words without significant information. You can consider these words as a filler words. If these words are used in search statement, then result could potentially contain all the record from data store. 



It is always a good idea to filter out these words before constructing the search query. It will make the search query smaller, faster and result will be more relevant.







One of our major performance optimizations for the “related questions” query is removing the top 10,000 most common English dictionary words (as determined by Google search) before submitting the query to the SQL Server 2008 full text engine. It’s shocking how little is left of most posts once you remove the top 10k English dictionary words. This helps limit and narrow the returned results, which makes the query dramatically faster.” - stackoverflow.com


English stop word list :
http://jmlr.csail.mit.edu/papers/volume5/lewis04a/a11-smart-stop-list/english.stop


Some interesting articles
http://www.codinghorror.com/blog/2008/11/stop-me-if-you-think-youve-seen-this-word-before.html
http://www.seobythesea.com/2008/08/google-stopword-patent/
http://www.googleguide.com/interpreting_queries.html




Comments

Popular posts from this blog

ERROR: Ignored call to 'alert()'. The document is sandboxed, and the 'allow-modals' keyword is not set.

Recently I found this issue while writing code snippet in "JSFiddle". And after searching, found this was happening because of new feature added in "Chrome 46+". But at the same time Chrome doesn't have support for "allow-modals" property in "sandbox" attribute. Chromium issue for above behavior: https://codereview.chromium.org/1126253007 To make it work you have to add "allow-scripts allow-modals" in "sandbox" attribute, and use "window.alert" instead of "alert". <!-- Sandbox frame will execute javascript and show modal dialogs --> <iframe sandbox="allow-scripts allow-modals" src="iframe.html"> </iframe> Feature added: Block modal dialog inside a sandboxed iframe. Link: https://www.chromestatus.com/feature/4747009953103872 Feature working Demo page: https://googlechrome.github.io/samples/block-modal-dialogs-sandboxed-iframe/index.html

Application Design Notes

Don’t be afraid to write your own code, but be absolutely sure you need to Don't reinvent the wheel Learn more about your libraries and take full advantage  Date time calculation is hard ( leap second ,  leap year ), use trusted library  js-joda ,  momentJs ,  joda (java) Simple is better than perfect (nearly) every time If you can deliver a sub-optimal solution (that solves the problem but has known limitation) in a week instead of a full featured one in a month DO, IT Simple system are Easy to reason about  Easy to debug Easy to refactor Easy to learn Simple doesn't mean you skip good engineering, but you can use duct tape. Build things the right way from the start, refactoring is hard and expensive Security Manage and store passwords securely Telemetry Common retrofitting "grunt work" Internationalization + localization Web Content Accessibility Factoring and styling HTML UI Adding unit test to an existing codebase LOG LOG LOG Log, but do it right We spend lot of t

How to store user password at server!!!

Trick is, you should never store user password… never ever. Now the real question is, then how to authenticate and authorize the user with password. And answer is when user enter the password, we should encrypt the password and store the hints. So next time when user enter the password we follow the same process and compare hints, if both hints are same then password is matched, else it is wrong password. Next question will be, what kind of hints, and how to generate these hints. In simple term hints are the obfuscated and fragmented form of user password. And very important part is hints generation process, which have to be collision resistant , means there will be very less possibility to find the data which generate same hints (like Cryptographic hashing functions ). Below is the simple checklist of password hashing and storing, which you should always keep in mind. PS You're Probably Storing Passwords Incorrectly Storing Passwords - done rig