In computing, stop words are words which are filtered out before or after processing of natural language data (text). There is no single universal list of stop words used by all processing of natural language tools, and indeed not all tools even use such a list. Some tools specifically avoid removing these stop words to support phrase search.

Any group of words can be chosen as the stop words for a given purpose. For some search engines, these are some of the most common, short function words, such as the, is, at, which, and on. In this case, stop words can cause problems when searching for phrases that include them, particularly in names such as ‘The Who’, ‘The The’, or ‘Take That’. Other search engines remove some of the most common words—including lexical words, such as “want”—from a query in order to improve performance.

Below is a group of stop words available for download. In response of interest of the previous article on English Stop Words, I have created a bunch of files for download.

Download Php Array Stop Word List

CSV Download of English Stop Words

Text file of stop words for download

Posted by xpo6

Software developer in the realm of AI, NLP and black magic.