Jump to content

(Archived) View OCRed pic as text


Recommended Posts

Our search technology isn't the same as simple Optical Character Recognition (OCR), since we're not just generating a single match for every word. Instead, we analyze the image to generate a weighted set of possibilities for each region so that we may match both "clue" and "due" against a word, if the letters are close together.

As a result, there isn't a simple text representation.

The key (from our perspective) is that we keep the original image, and we make it easy for you to find that image. Then you can just get what you want by looking at the image. This is a little different from "OCR" approaches that take a perfect quality scan, extract the text, and then throw away the image. We're assuming that not all of your pictures are perfect, so the most useful thing we can do is help you find the image itself.

Link to comment
  • 2 months later...

For the "clue" vs "due" explanation I can see the point: this is no "general purpose" OCR, but a rather fuzzy/smart take on the subject. Good! It fits Evernote's purpose quite right. This is one of those "I wish I had thought about that myself" moments...

STILL, let me insist. Sometimes I take a picture of a magazine or something to quickly get the text (big typeface sections, not full articles on small type), and I could be perfectly happy if Evernote gave me a quick "export to text" of the pic, even if the "best bet" still needed some retouch. Even if I got a "the results from this operation might not be what you expect" disclaimer before going on.

I can guess an answer: "users will get frustrated and curse us for the 'lousy' results, because our OCR is for something else". But I promise I won't curse you ;)

Let me go a little too far. Now that there's an API, I see a new way to let users do stuff to empower Evernote:


  • [*:26p2gfkj]Add a "tools" option to the menu (Win/Mac clients)
    [*:26p2gfkj]Let each "tool" be an external program
    [*:26p2gfkj]Define some standard parameters to communicate with these external programs, for example:

    • [*:26p2gfkj] to send the full XML of the note as a string
      [*:26p2gfkj] to send the path of a temporary XML file with the note
      [*:26p2gfkj] and same, but for the text only
      [*:26p2gfkj] and you get the idea
      [*:26p2gfkj] etc to send ALL the selected notes as a single file/string
      [*:26p2gfkj] do something accordingly for the output (STDOUT into new note, PATH into new note...)

[*:26p2gfkj] Repeat for the web, where instead of an external program you would get a URI for a SOAP/REST service.

That was far-fetched, isn't it? :P Well, in this way I could add a command-line OCR command to do exactly what I want, and you could finally answer everyone about this other choice to let people do their OCR any way they wanted.

Just thinking... and wishing.

Link to comment

You're right that, since our focus is on searching bad images rather than OCR'ing good scans, any simple text match results are going to be unsatisfying. While I believe that YOU wouldn't complain about these results, I'm positive that quite a few other people would post snarky blog posts about our humorous and bad mistakes. ("Evernote OCR: EPIC FAIL!")

However, if you really are interested in playing with the raw results via XML, the recognition information is in the new export file format (.enex) along with the note. The documentation on this data format is a bit light, but you can see a bit of it in the API Overview, Appendix B:

http://www.evernote.com/about/developer/api/

You could write a script on either Mac or Windows that exports notes to this file format and then extracts the recognition data for your images. This would require a little bit of work to process the two-level XML (since the recognition XML data is stored as a CDATA string within the export format), but this wouldn't be rocket science.

Link to comment
  • 1 month later...

Thanks Dave! I've been wanting to dive into the API. Maybe these holidays...

I'll peek into the OCR results, thanks for letting me know what to expect in terms of this XML-inside-XML encoding. I'll also try to feed the raw image data somewhere else and see what happens.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...