(Archived) View OCRed pic as text

nordlys2 · August 30, 2008

Is there a way to do view the ready OCRed text in a photo?

Also, is only iSight supported for snapshots on the Mac, or other webcams as well?

: øystein

engberg · August 30, 2008

Our search technology isn't the same as simple Optical Character Recognition (OCR), since we're not just generating a single match for every word. Instead, we analyze the image to generate a weighted set of possibilities for each region so that we may match both "clue" and "due" against a word, if the letters are close together.

As a result, there isn't a simple text representation.

The key (from our perspective) is that we keep the original image, and we make it easy for you to find that image. Then you can just get what you want by looking at the image. This is a little different from "OCR" approaches that take a perfect quality scan, extract the text, and then throw away the image. We're assuming that not all of your pictures are perfect, so the most useful thing we can do is help you find the image itself.

alexjulien · November 3, 2008

For the "clue" vs "due" explanation I can see the point: this is no "general purpose" OCR, but a rather fuzzy/smart take on the subject. Good! It fits Evernote's purpose quite right. This is one of those "I wish I had thought about that myself" moments...

STILL, let me insist. Sometimes I take a picture of a magazine or something to quickly get the text (big typeface sections, not full articles on small type), and I could be perfectly happy if Evernote gave me a quick "export to text" of the pic, even if the "best bet" still needed some retouch. Even if I got a "the results from this operation might not be what you expect" disclaimer before going on.

I can guess an answer: "users will get frustrated and curse us for the 'lousy' results, because our OCR is for something else". But I promise I won't curse you

Let me go a little too far. Now that there's an API, I see a new way to let users do stuff to empower Evernote:

[*:26p2gfkj]Add a "tools" option to the menu (Win/Mac clients)
[*:26p2gfkj]Let each "tool" be an external program
[*:26p2gfkj]Define some standard parameters to communicate with these external programs, for example:
- [*:26p2gfkj] to send the full XML of the note as a string
  [*:26p2gfkj] to send the path of a temporary XML file with the note
  [*:26p2gfkj] and same, but for the text only
  [*:26p2gfkj] and you get the idea
  [*:26p2gfkj] etc to send ALL the selected notes as a single file/string
  [*:26p2gfkj] do something accordingly for the output (STDOUT into new note, PATH into new note...)

[*:26p2gfkj] Repeat for the web, where instead of an external program you would get a URI for a SOAP/REST service.

That was far-fetched, isn't it? Well, in this way I could add a command-line OCR command to do exactly what I want, and you could finally answer everyone about this other choice to let people do their OCR any way they wanted.

Just thinking... and wishing.

engberg · November 3, 2008

You're right that, since our focus is on searching bad images rather than OCR'ing good scans, any simple text match results are going to be unsatisfying. While I believe that YOU wouldn't complain about these results, I'm positive that quite a few other people would post snarky blog posts about our humorous and bad mistakes. ("Evernote OCR: EPIC FAIL!")

However, if you really are interested in playing with the raw results via XML, the recognition information is in the new export file format (.enex) along with the note. The documentation on this data format is a bit light, but you can see a bit of it in the API Overview, Appendix B:

http://www.evernote.com/about/developer/api/

You could write a script on either Mac or Windows that exports notes to this file format and then extracts the recognition data for your images. This would require a little bit of work to process the two-level XML (since the recognition XML data is stored as a CDATA string within the export format), but this wouldn't be rocket science.

alexjulien · December 4, 2008

Thanks Dave! I've been wanting to dive into the API. Maybe these holidays...

I'll peek into the OCR results, thanks for letting me know what to expect in terms of this XML-inside-XML encoding. I'll also try to feed the raw image data somewhere else and see what happens.

(Archived) View OCRed pic as text

Recommended Posts

nordlys2 0

Link to comment

engberg 89

Link to comment

alexjulien 0

Link to comment

engberg 89

Link to comment

alexjulien 0

Link to comment

Archived

Community Resources