Jump to content

rezecib

Ex Employees
  • Posts

    91
  • Joined

  • Last visited

  • Days Won

    2

Community Answers

  1. rezecib's post in Searching for Exact Phrase (General) was marked as the answer   
    So exact phrase searches do work in general (this is part of our integration test suite; but if you do find a case not covered by the explanation below, definitely let me know). However, the case described in the original post has a bit of a wrinkle to it: "stop words".
    TL;DR: common words don't get indexed, so you can't search for them
    Long explanation:
    The way searching works is that when a document is created or updated, it's added to several "inverted indices". Each index maps terms to documents and their position within the document. So there will be an inverted index for "object" and one for "oriented", and a search for "object oriented" will first check these indices, and then remove any documents where "object" and "oriented" aren't in adjacent positions in that order.
    However, some terms are really really common, like "the" and "in". If you were to make an inverted index for them, it would contain basically every document-- with 200 million users with lots of notes each, that's a LOT of documents. The standard approach to handling this is to just not index these terms; this is called "stop word filtering". This means that these terms can't be found by the search, even in an exact phrase match; it can't do the first step of finding the documents that contain the stop word. It can, however, do mostly-exact matching on phrases with stop words in the middle, like "money in the bank"; in this case it will still ignore "in the", but it knows that it's looking for an offset of 3 rather than 1 or 2 between "money" and "bank". So it would match "money in the bank" and "money hello world bank", etc.
    It would be possible to filter the list of results after they have been returned, in the client, because they actually have access to the note content. So if you want this feature (I can definitely see the use of it), make a feature request for "Filter search results by exact phrase matches in the text, including stop words".
    Edit: Ah, meant to include our current stop word list:
    "a", "an", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"
×
×
  • Create New...