Jump to content

rezecib

Evernote Staff
  • Posts

    91
  • Joined

  • Last visited

  • Days Won

    2

Community Answers

  1. rezecib's post in Searching for Exact Phrase (General) was marked as the answer   
    So exact phrase searches do work in general (this is part of our integration test suite; but if you do find a case not covered by the explanation below, definitely let me know). However, the case described in the original post has a bit of a wrinkle to it: "stop words".
    TL;DR: common words don't get indexed, so you can't search for them
    Long explanation:
    The way searching works is that when a document is created or updated, it's added to several "inverted indices". Each index maps terms to documents and their position within the document. So there will be an inverted index for "object" and one for "oriented", and a search for "object oriented" will first check these indices, and then remove any documents where "object" and "oriented" aren't in adjacent positions in that order.
    However, some terms are really really common, like "the" and "in". If you were to make an inverted index for them, it would contain basically every document-- with 200 million users with lots of notes each, that's a LOT of documents. The standard approach to handling this is to just not index these terms; this is called "stop word filtering". This means that these terms can't be found by the search, even in an exact phrase match; it can't do the first step of finding the documents that contain the stop word. It can, however, do mostly-exact matching on phrases with stop words in the middle, like "money in the bank"; in this case it will still ignore "in the", but it knows that it's looking for an offset of 3 rather than 1 or 2 between "money" and "bank". So it would match "money in the bank" and "money hello world bank", etc.
    It would be possible to filter the list of results after they have been returned, in the client, because they actually have access to the note content. So if you want this feature (I can definitely see the use of it), make a feature request for "Filter search results by exact phrase matches in the text, including stop words".
    Edit: Ah, meant to include our current stop word list:
    "a", "an", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"
  2. rezecib's post in Highlighting of search terms within notes was marked as the answer   
    @Limehouse Looks like the highlighting works as expected in the Mac client (highlighting both occurrences), but not in the web client (and I guess not Windows and Android, which I don't have test setups for myself). So this is a client bug, not something you're doing wrong.
    To give some more insight into the mechanics of this, search retrieval and highlighting are done separately-- so in terms of search indexing/retrieval, both "to do list" occurrences will be tokenized into "to", "do", and "list"; searching "to do list" will match both because they both have the same tokens in the same order. Highlighting is implemented separately by each client, done on opening the note with the search context still active. And it looks like some clients are not using the correctly tokenized terms to do that highlighting.
    Edit: a workaround you can do is to add some redundant terms to the search, such as:
    "to do list" to do list this will obviously over-highlight, but at least it will catch the cases that are being missed here.
×
×
  • Create New...