Jump to content

rezecib

Evernote Staff
  • Content Count

    91
  • Joined

  • Last visited

  • Days Won

    2

Posts posted by rezecib

  1. On the server-side search, resource file names are included in the "all" field, so it's as if they are actually in the note text. You can also manually search for just them with resourceFileName:somefilename. The Mac and Windows clients don't use server-side search right now, though.

    Google Drive attachments, though, are really just links rather than true attachments, so they're not a "resource". So I don't think there's a good way to search for them, unfortunately. I'm noting it down as an improvement we could look into, though :)

    • Like 2
  2. @Alexak18 My understanding (I don't work on this part of the system) is that normally the client tries to do synchronization as efficiently as possible-- it keeps track of the version of the data it has from the server, and when it syncs it downloads only the stuff that happened after that version. In the case of big updates or local database corruption, an incremental sync like that doesn't work because the local data is broken. What this does is force it to start from scratch and download everything from the server.

    • Like 2
  3. 17 hours ago, jefito said:

    Can I take it as a given that each note is indexed by breaking it into tokens and storing them in a sorted list (presumably with location information for each instance); binary search doesn't make sense otherwise.

    I glossed over a little of the complexity there-- technically the terms dictionary is stored as a prefix-tree of blocks, where blocks are a fixed size. So It can navigate the prefix tree a little faster than a binary search, but terms that share that prefix will all be in the same block.

    But there are actually two separate structures at play there, there's the term dictionary and then there are inverted indexes for each term. But yeah, a note is indexed by setting up fields, the main text ones being tokenized and then inserted into the dictionary and put into the inverted indexes for those terms.

    17 hours ago, Don Dz said:

    But couldn't infix search be offered as a separate command, with a warning like "this operation can take a long time, consider trying regular search", or something like that?

    That makes sense for a local search, but becomes a bit more dubious for server-side search (that's what I work on), because expensive queries don't just affect you. So the natural thing that occurs is "well, let's prevent you from doing it too much"-- this touches on rate limiting and fairness, which are surprisingly complicated in a distributed system (they're not unsolvable, but they're tricky enough that you can mess them up pretty easily). Local searches are a bit of a hairy beast as well, but because the different clients have separate local search implementations, much to my frustration. So... in both cases yes, it could be done, but it's not as easy as it should be.

    • Like 1
  4. @galaxywarrior search relevance is something we're actively working on. As pointed out by others, the Mac client has relevance, but it should be coming to the beta web client too. It will be a little fancier than just prioritizing titles, but that's definitely a part of it.

    2 hours ago, Don Dz said:

    Evernote does not search for strings of characters in random locations, they must always be at the beginning of a word, otherwise they will not be found (inside of notes at least, tags have no such limitation, weird).

    As far as I know, any case where you can do infix search (like "*term*") is handled client-side. So like the client has already acquired the list of tags you have and may do a local infix search through them to find the tag, and then actually search by the tag's guid.

    I would like to be able to support infix search, but when you're dealing with a large corpus of data (and searching across all note content in an account can definitely count there), infix search has pretty serious scalability issues. Like it's usually fast to do an arbitrary regex on a single document, but 10,000 of them and it starts to be super slow.

    In a scalable search system (i.e. one with an index), the way a search works is:

    • Take the user input and break it up into tokens, joined by some operator (in our case, AND)
    • If a token isn't set to a particular field (like "title", "tag", etc), set it to the "all" field
    • If a token has wildcards, check the term dictionary for that field to find terms that match, and expand the token into a search for each of those joined by "OR"
    • Now find the list of documents matching each of those terms by looking in each term's index
    • Join the list of documents and rank them (such as by term frequency)
    • Return the ranked list of documents

    For the expanding the token wildcard part, a final wildcard is pretty efficient-- you can binary-search to efficiently find the region of tokens that will match (or find that there are none), and then just scan from there until you've got the whole clump of them, because they will all be together. A leading wildcard, however, doesn't guarantee that they'll be together, so you have to scan the whole dictionary. (I have some crazy ideas about how to make this more efficient, but suffice it to say that it's annoying/expensive enough that you'd really need solid ground that people would use it). If you have both wildcards, then... that's an even more complicated problem to solve.

    Because final wildcards are so easy, and it's pretty common to want that sort of expansion, we add those automatically. So searching "hello" will be automatically converted to "hello*".

    • Like 1
  5. I'm guessing it's the non-beta web client (which is using the older editor since the new one broke a while back). Edit: looks like it must be the Mac client, actually

    @Allison Jaynes You can get those extra formatting options back in the beta either web client, which you can opt into from the settings page. (the beta is missing a few features itself at the moment, but those should get in eventually). You could also try the checkbox lists (and then check completed items instead of doing strikethrough on bulleted/numbered lists).

  6. 1 hour ago, jefito said:

    That it's being displayed as a date rather than an integer value seems like a (small) bug, but since the number isn't meant to mean much of anything to a user, and is only used to save order state when using a reminders list (as in Snippet, Card, and Thumbnail views) with "Sort reminders by date" unchecked, displaying as a date is harmless, though confusing.

    You're right, this isn't supposed to really be human-readable. It looks like clients can set it, but they're not really supposed to (initially, anyway), in which case the server supplies the current timestamp (a 64-bit integer). But clients could overwrite that with any integer they choose (for example, when rearranging reminders in the list, in the Mac client at least). So viewing it as a date makes... a tiny bit of sense, but is mostly misleading.

    • Like 2
  7. 6 hours ago, jefito said:

    That being said, "My Notebook Name" doesn't have three prefixes; in terms of this discussion, it's composed of three words, "my", "notebook", and "name" (because of the space delimiters), each of which can be matched using a prefix search (term* as @rezecib puts it), if you have the option enabled. The point is that a prefix is any substring of another string (like a notebook name, or the individual words that comprise it) that begins from the string's first character. So you'd get a match on "my notebook name" for prefixes like "my" (an improper prefix, boo), "no", or "na" (proper prefixes). . The spaces don't matter if the option is disabled; you get an infix search (*term*) instead.

    Yeah, I believe it's searching the tokens extracted from the string, not the original string. Which I understand is pretty confusing considering that notebook names are treated by the main search as a keyword (that is, not tokenized). But this behavior is local to the Windows client as far as I know, so I don't know exactly what it's doing.

    I would consider this use of "prefix" to be search-specific jargon with some loose relation to the normal word's meaning.

    • Like 1
    • Thanks 1
    • Haha 2
  8. As far as I know Evernote Business does not include a Productivity Planner (I certainly don't get one on my own EB account), so this must be getting populated by some integration that's gotten linked to your account somehow. I do see an IFTTT applet that is set up to produce a Productivity Planner note daily.

    IFTTT has to be granted access to your account (if someone had your login/session somehow they could've done it for you?), but you can revoke it from the Applications section of the settings in the web interface. Here's what it looks like for me:

    EvernoteRevokeIFTTT.thumb.png.a51b711aa1de17fad483628317975f7d.png

  9. 15 minutes ago, Mr Jumbo Guy said:

    What's "ocr'd"?

    OCR = Optical Character Recognition. An image of text needs to be OCRd to be searchable, so Evernote automatically extracts text from these and indexes them so you can search them.

    As for when/why notes would be accessed, the only case I know of where an employee would actually look directly at a note is during the tech support process, if there's some problem with a note and you give them permission to look at the problem note(s).

    It was stressed a lot during my orientation that we want to protect your data. Access to it should only be done by machines, like those that are doing OCR, or indexing for search, etc.

    But as s2sailor recommended, the best way to secure them is to attach an encrypted file. That way it is basically impossible for us to be able to access it (note that this also excludes its content from search, because our machines can't make any sense of it either).

    • Like 5
×
×
  • Create New...