Jump to content

(Archived) Text recognition in images - known words only?


Recommended Posts

I was doing some testing to see what to expect from the text recognition within images. So I tried uploading some kind of ID, I was able to search for the name within the ID etc., but when I tried to search for the ID number (ie. "M60302010"), it came up with nothing. Is that because it's not a word, so Evernote does not list it as searchable?

Also, any chance you will support danish characters like æ / Æ, ø / Ø, å / Å sometime in the future?

Keep up the great work :(

Link to comment

I'm wondering about this as well. I have a note which is an image file (a phone-camera snapshot) of a book cover. The title of the book is "The Art of Talking to Anyone". EN would find the note if I searched for "talk" but not "talking". Strange.

Link to comment

Another funny thing, I took a picture (with the "documents" setting) with my nokia n82 5mp cameraphone of a pizza place menu card. when I search for "ham" it will find most of the menus that contain ham, but it will also highlight "Mozzarella" for some weird reason :D

I guess it might be because the font is a bit small so maybe it can't "seperate" the words properly.

Link to comment

Our image processing is pretty complicated. It uses two different methods to determine what words are in an image, and then combines the results.

The first method tries to "read" the word one letter/digit at a time, and then translate sequence of letters into a word. This would find "M60302010", but only if it got every digit correct. If it thought the '1' was an 'I', it might come up with "M603020I0" instead.

The second method tries to match the word against an English dictionary. This gives a lot more weight to any interpretation that is in the dictionary.

As a result, the system is capable of matching random sequences like "M60302010", but it will do a lot better with words that are in our dictionary.

We plan to expand this to non-English locales and languages in the next year, but this requires a lot of specialized work in the image processing libraries to recognize the shapes of non-Latin letters and to build an appropriate dictionary and testing suite.

Thanks

Link to comment

Thank you very much for the explanation :D It would be cool though if it was possible to see all the words that it has "found" in an image, but I suppose it might get too complex then.

Just for the record, I tried shooting a better pic of the menu card and uploading it. Now it doesn't find "mozzarella" when searching for "ham". So of course, quality of the image has a saying as well. Too bad Apple didn't put iSight HD's in their MacBook Pro's starting a year ago, would making putting images into Evernote much easier, than with the current "low-quality" iSight.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...