Sourcecode Listing of

classes/NEUMES/distributed_image_library/model/StopWordsList.properties





Color Key :   [•] XML code      [•] XML comment   

Line 0001 ### StopWordsList.txt
0002 ### Version: 11 December 2006
0003 ### Author: Louis W. G. Barton for the NEUMES Project
0004 ### see, http://purl.oclc.org/SCRIBE/NEUMES/
0005 ###
0006 ### Syntax:
0007 ### - Parser shall ignore any line containing the hash '#' character.
0008 ### - Parser shall ignore blank lines.
0009 ###
0010 ### Usage:
0011 ### - The LowerCaseFilter shall be run before the stop words filter.
0012 ### [Alternatively: stop words; then LowerCaseFilter; then stop words again.]
0013 ### - Do not stop "net" and "com" as they shall be used in domain-name searches.
0014 ### - Occasionally monitor the tokenizer's Tokens list to identify frequent, but
0015 ### non-significant words as candidates for addition to stop-words list.
0016 ###
0017 ### Assumptions:
0018 ### - Query keywords "AND" and "OR" shall be caught up-steam by Lucene parser.
0019 ###
0020 ### To do:
0021 ### - Consider interaction with quoted strings in Lucene query, where user might
0022 ### wish to bypass the stop words filter for quoted terms.
0023 
0024 ### Stop numerals:
0025 0
0026 ### English stop words:
0027 a
0028 an
0029 and
0030 are
0031 aren
0032 aren't
0033 as
0034 at
0035 be
0036 been
0037 being
0038 by
0039 did
0040 didn
0041 didn't
0042 do
0043 don
0044 don't
0045 does
0046 doesn
0047 doesn't
0048 false
0049 for
0050 gif
0051 had
0052 hadn
0053 hadn't
0054 have
0055 haven
0056 haven't
0057 having
0058 he
0059 he's
0060 his
0061 htm
0062 html
0063 http
0064 https
0065 image
0066 images
0067 img
0068 in
0069 is
0070 isn
0071 isn't
0072 it
0073 its
0074 jpg
0075 n/a
0076 null
0077 of
0078 or
0079 such
0080 text
0081 that
0082 the
0083 their
0084 them
0085 then
0086 there
0087 these
0088 they
0089 this
0090 those
0091 time
0092 times
0093 true
0094 was
0095 wasn
0096 wasn't
0097 were
0098 weren
0099 weren't
0100 www
0101 #[end, StopWordsList.txt]
= END LISTING =