Search operators and wildcards
You can use and combine different search operators and wildcards in your search queries:
To perform a single character wildcard search use the “?” symbol.
To perform a multiple character wildcard search use the “*” symbol.
Unknown or all parts with *
Multiple character wildcard searches with * looks for zero, one or more characters.
will find corrupt, corruption, corrupted … but not anticorruption
will find corrupt or interrupt but wont find corruption or interruption…
will find corrupt, corruption, anticorruption, interrupt and interruption …
You can also use the wildcard searches in the middle of a term:
will find corupt or corrupt
Single character wildcard
The single character wildcard search looks for terms that match that with the single character replaced.
For example, to search for “text” or “test” you can use the search:
Boolean operators allow terms to be combined through logic operators. Lucene supports AND, “+”, OR, NOT and “-” as Boolean operators(Note: Boolean operators must be ALL CAPS).
will find all documents which contains the word engine AND the word search
search AND engine
would return the same results
search OR research
will find all documents which contains the word search or the word research
Exclusion with – or NOT
The “-” or prohibit operator excludes documents that contain the term after the “-” symbol:
search NOT engine
search AND NOT engine
will find all documents which contain the word search and do not contain the word engine
Phrase search with double quotes
A Phrase is a group of words surrounded by double quotes such as
which will find exactly this phrase and will not find i search an engine
Find results with typos or names not exactly pronounced by edit distance (Levensthein distance)
Fuzziness by edit distance is useful if you search for names you don’t know how to spell exactly or if you want to consider typos in the documents or if some documents are images or scanned documents (like PDF documents containing scanned pages only in graphical formats instead of digital text), so you maybe find more documents for your query even if the results of automated text-recognition (OCR) are not perfect).
To do a fuzzy search use the tilde, “~”, symbol at the end of a single word / term you want to search with fuzziness.
For example to search for a term similar in spelling to “roam” use the fuzzy search:
This search will find terms like foam and roams.
would find not only similar words but similer works, too because both words are set to fuzzy search.
Setting maximal edit distance (number of chars, that may differ)
If you use the ~ operator for fuzziness, the default maximal edit distance is two chars. This is the maximal number of chars that can be changed, added or deleted to match a term that is searched with fuzziness.
To change the maximum number of changeable chars, set the number after the ~ operator.
Considering grammar rules to find other word forms, too (stemming), the search engine will find more.
For example if you search for company corruption you will find companies corrupted, too.
If you want only results matching exactly the spelling in your search query, you can disable stemming by disabling the option “Find other word forms, too”.
If you disable stemming for the whole search query, you can enable it for single terms by using the operator
If you enable stemming for the whole search query, you can disable stemming for single terms by using the operator
Combine criterias with OR, AND and parentheses
You can use parentheses to group clauses to form sub queries:
(search OR research) AND (engine OR software)
* documents containing the words search and engine
* documents containing the words search and software
* documents containing the words research and engine
* documents containing the words research and software
* documents containing the words search and research and software and engine
But it wont find a document containing the words search and research without at least one of the words engine or software.
Only one of the two last words would be enough, because connected with OR, but one of them is absolutelly neccecary because this criteria combination (connected by parantheses) is connected with AND