Jump to content

OCR words in photos taken


Recommended Posts

Hi all, I'm a premium member now if that makes a difference.

 

Before I start scanning my whole life I'm doing some testing and I found this a bit troubling.  I scanned the following wine bottle (good wine)

https://www.evernote.com/shard/s3/sh/01e06fd5-84f2-432d-bfb6-27602551054b/f56883009dbe3c21b806e79bbc27317d

 

And when I search in evernote for "Merlot" nothing comes up in the search.

 

But if I search for anything else on the label like "laundry" or "cabernet" or "Okanagan" this label does come up.  Why won't it find Merlot?

 

 

Link to comment

puzzling. 
I loaded that image into a note on my own premium evernote account. 

 

The desktop client on my Mac is saying the image has yet to be indexed (its been about 10 minutes, multiple syncs....) and as a result, it is not searchable. 
 

Vexingly, if I log into the web interface and search, for example, for "laundry", sure enough, it finds the text in the image! So it IS in fact, indexed!

 

What version of the Evernote client are you using?

Link to comment

Ok, I am using 5.6.0 Public Beta 1. Yet we are both experiencing issues....

 

When you log into the web interface and search for a word like "laundry" or "dirty" (limiting the search in a sensible way so that you don't get piles of extraneous results), does it detect those words in the image, as it did for me?

 

The version we are running on our desktop should have no bearing on what we see in the web interface. 

Link to comment

Scott, my guess is a photograph may OCR better in this case than a flat scanner, considering the curvature of the bottle.

Hmmmm but OCR seems to be successful given that it turns up in search results just fine in the web client. It just seems like the desktop client isn't getting the message that the attachment is in fact indexed! 

 

Also, even this, comparatively worse photograph of a wine bottle got OCR'd just fine!

https://www.evernote.com/shard/s25/sh/eb5fc921-79f2-41d1-bb27-e3cb8000114c/8dff5e8e44b14fffa8bfb0909c15068d

Link to comment

Guys FYI, I took the photo with my iphone and I think I had it on "document" mode.  Scott yes when I search for dirty, or laundry I get the highlighted text on the label that it found it, yet it won't find "merlot"  As you can see merlot is a bit wrapped around but the picture is very clear and to me shouldn't be an issue.

 

This said,  how long should I expect Evernote to take to crunch the data when I upload a photo/document?

Link to comment
  • Level 5

I think the curvature of the bottle and the spacing of the letters are combining forces to confuse the OCR.

The space between the E and the R is a particular problem.

If you search for 'me riot' (without quotes), you'll see that merlot is highlighted.

Link to comment

Guys FYI, I took the photo with my iphone and I think I had it on "document" mode.  Scott yes when I search for dirty, or laundry I get the highlighted text on the label that it found it, yet it won't find "merlot"  As you can see merlot is a bit wrapped around but the picture is very clear and to me shouldn't be an issue.

 

This said,  how long should I expect Evernote to take to crunch the data when I upload a photo/document?

1) I too, am unable to get "2012" or "merlot" to show up in a search. Odd since the characters that comprise those words are fairly clear. 

2) Well, it should be a matter of minutes. If we go by the results in the web client I think this was OCR'd within 5 minutes (at least, it was about 5 minutes between me adding it to my own Evernote account and me trying the web client). 

 

The time to OCR will depend on the overall server load and whether you are free or premium. Premium users are pushed ahead on the OCR queue and so should see OCR results faster than free users. That being said, when things are working, its a matter of minutes, (sync up, a few minutes, sync the OCR data back down) to get results. 

 

The trouble is, it seems like the desktop client isn't recognizing that the image has been OCR'd. This is unrelated to how quickly the image gets OCRd. 

Link to comment

I think the curvature of the bottle and the spacing of the letters are combining forces to confuse the OCR.

The space between the E and the R is a particular problem.

If you search for 'me riot' (without quotes), you'll see that merlot is highlighted.

Adjusting, any insight into why the desktop client might persist in saying the image has not been indexed? 

Link to comment
  • Level 5*

You can find out for sure (at least for image OCRs) by exporting to Evernote format, and searching for the desired text among the 'recoText' items. If it's not there, the OCR didn't recognize it, for whatever reason (fuzzy image, spacing, and other factors)

Link to comment

You can find out for sure (at least for image OCRs) by exporting to Evernote format, and searching for the desired text among the 'recoText' items. If it's not there, the OCR didn't recognize it, for whatever reason (fuzzy image, spacing, and other factors)

but the strangeness is that it is OCR'd, because searching using the web client returns (limited) results. It just seems like the desktop client isn't recognizing that the image has been OCR'd. 

Link to comment
  • Level 5*

Oh, I'm sure it's OCR'd, but that's just a fancy word for 'guessing' -- it doesn't mean that it always gets every word just exactly perfect. This is based on what Adjusting is reporting. And to see what the guesses are, you can examine the recoText items in a .ENEX file. I'll clip the image to my account to check it out.

Link to comment
  • Level 5

Adjusting, any insight into why the desktop client might persist in saying the image has not been indexed? 

 

No idea. I've reported the issue internally, so someone will look into it. I'll let you know if we find anything.

Link to comment

 

Adjusting, any insight into why the desktop client might persist in saying the image has not been indexed? 

 

No idea. I've reported the issue internally, so someone will look into it. I'll let you know if we find anything.

 

Cheers!

Link to comment

As Jefito said, OCR'ing images is "guessing".  OCR'ing text is more accurate, but still not dead on, IME.  When OCR'ing images, a tree of possibilities is created to allow for low res camera phones and poor handwriting.  So the word 'house' may show up when looking for 'horse'.  IMO, if you want to be able to find the notes using certain search terms, it's best to add those terms (as keywords) to the note. 

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...