Jump to content

(Archived) Text recognition question


Recommended Posts

This isn't possible because we aren't actually doing a simple "Optical Character Recognition" (OCR) on your images to produce a single text document. Instead, we produce a complicated index of possibilities that we use for searching purposes. This means that if you took a blurry picture of a sign that has the word "dove" in it, our index may contain possibilities for the word "dove" and the word "clove", which could look similar in a blurry photo.

This approach makes us much better at helping find your notes, but it means that we're not tuned to producing a single text output for your images.

Link to comment

Would it be possible to include additional dictionaries or word databases? For instance, I've noticed that text in images that contain medical terminology is not always correctly processed. Given the method that EverNote uses, which I think is reasonable for the vast majority of users, this problem is to be expected.

I wonder if it would be possible for a user to "opt in" to certain niche databases of words. That way, it would not affect the computing time for users who do not require this feature.

Link to comment

The indexing technology does find words based on a character-by-character match, not just English dictionary words, which allows you to find proper nouns (names, etc.), but dictionary matches are typically scored/weighted higher than this character-by-character recognition. This is primarily intended to reduce the number of "false positive" matches, where you find images that don't really contain your target search words. This is still an area of active research and tuning for us, however.

We're planning on adding support for additional dictionaries to improve recognition of other languages, but we hadn't currently planned to introduce user-defined dictionaries. This suggestion sounds like a potentially good idea, but it would realistically be a while until we got to this.

Thanks

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...