Although there are different ways of removing stop words from a document (or a bundle of documents), an easy way is to do so with the NLTK (Natural Language Toolkit) on Python.
You can use the stopwords lists from NLTK and the build in functionality to do the work.
A simple example would be:
>>> from nltk import word_tokenize
>>> from nltk.corpus import stopwords
>>> import string
>>> sent = "this is a message containing stopwords."
>>> stop = stopwords.words('english') + string.punctuation
>>> [i for i in word_tokenize(sent.lower()) if i not in stop] ['message', 'containing', 'stopwords']

In case you have specific stopwords that you would like to omit, you can always create a set and exclude it from the stopword list.
operators= set(('and','not'))
stop = set(stopwords.('english'))- operators

The condition would be as above:
if i not in stop :
# use word

If you are using Groovy:

String removeStopWords(String textToClean){
Collection stopWords = ["I", "a", "above", "after", "against", "all", "alone", "always", "am", "amount", "an", "and", "any", "are", "around", "as", "at", "back", "be", "before", "behind", "below", "between", "bill", "both", "bottom", "by", "call", "can", "co", "con", "de", "detail", "do", "done", "down", "due", "during", "each", "eg", "eight", "eleven", "empty", "ever", "every", "few", "fill", "find", "fire", "first", "five", "for", "former", "four", "from", "front", "full", "further", "get", "give", "go", "had", "has", "hasnt", "he", "her", "hers", "him", "his", "i", "ie", "if", "in", "into", "is", "it", "last", "less", "ltd", "many", "may", "me", "mill", "mine", "more", "most", "mostly", "must", "my", "name", "next", "nine", "no", "none", "nor", "not", "nothing", "now", "of", "off", "often", "on", "once", "one", "only", "or", "other", "others", "out", "over", "part", "per", "put", "re", "same", "see", "serious", "several", "she", "show", "side", "since", "six", "so", "some", "sometimes", "still", "take", "ten", "the", "then", "third", "this", "thick", "thin", "three", "through", "to", "together", "top", "toward", "towards", "twelve", "two", "un", "under", "until", "up", "upon", "us", "very", "via", "was", "we", "well", "when", "while", "who", "whole", "will", "with", "within", "without", "you", "yourself", "yourselves", "symptom", "symptoms"] return textToClean.tokenize().minus(stopWords).join();

Posted by xpo6

Software developer in the realm of AI, NLP and black magic.