Jump to content

(Archived) Very bad OCR quality


Recommended Posts

Hi everybody,

 

since there is a lot of rumour about the good OCR qualities of evernote, I bought a Doxie go and gave it a try.

I scanned a huge stack of old bills with the standard settings of Doxie and moved them to evernote (premium).

 

But my first impression is depressing: From 179 bills adressed to me, only 13 were found when I'm searching for my name. When I export the searchable pdfs from evernote, they are a total mess. Nothing that even resembles the text that is really on the document.

 

What might have been my fault? I was scanning with 300 dpi - and the scans are rather good, I believe. Please take a look at the attached bill (zipped) and what Evernote made of it.

 

How may I improve this?

 

Regards

Ralf

 

 

Rechnung bis 20130003-searchable.pdf

Rechnung bis 20130003.zip

Link to comment
  • Level 5
My OCR test results indicate nearly 100% perfection. It takes a few seconds longer, but I always let my ScanSnap do the OCR for me.
 
Why?

1.) Exported PDFs:
ScanSnap: The PDF document remains OCR'd if I export it from Evernote.
Evernote: The PDF document loses its OCR if I export it from Evernote. 
 
2.) Consistency:
ScanSnap: The search results are consistent in Evernote, whether I view them from my desktop client or the Evernote web.
Evernote: The search results are not consistent because Evernote uses different OCR software depending on the platform. 
 
3.) 100% OCR:
ScanSnap: Works on notes that are stored in my local non-sync'd Evernote notebooks.
Evernote: Evernote cannot see my notes on my local non-sync'd notebooks, so the PDF's cannot be OCR'd. 
 
4.) No complex rules:
ScanSnap: OCR's all my PDF's - no rules and I know it is done.
Evernote: Evernote has 5 technical rules to follow and no warning if the document fails to meet all the rules
 
Link to comment
  • Level 5

Something is definitely wrong if Evernote can only find 13 hits out of 179.

Evernote's OCR is certainly better than that. Back in 2010, the Evernote CTO, Dave Engberg, said "Our text processing for PDF is basically using best-of-breed OCR software to produce a second PDF document that we internally index."

 

I looked at the scanned document (zipped file) and the scan looks very clear.

The OCR should have no trouble with it.

But the Evernote OCR'd version (pdf file) looks just awful.

Perhaps the German words could be confusing the OCR. Sorry, but I don't know.

 

I don't think that Evernote will respond here, but another user might have some input.

My suggestion would be to submit a support request to Evernote for assistance.

FYI - Monday is a national holiday in the USA

Good luck.

Link to comment

Hi everybody,

 

since there is a lot of rumour about the good OCR qualities of evernote, I bought a Doxie go and gave it a try.

I scanned a huge stack of old bills with the standard settings of Doxie and moved them to evernote (premium).

 

But my first impression is depressing: From 179 bills adressed to me, only 13 were found when I'm searching for my name. When I export the searchable pdfs from evernote, they are a total mess. Nothing that even resembles the text that is really on the document.

 

What might have been my fault? I was scanning with 300 dpi - and the scans are rather good, I believe. Please take a look at the attached bill (zipped) and what Evernote made of it.I

 

How may I improve this?

 

Regards

Ralf

Did you sync to the EN servers, wait a while then sync again?

http://discussion.evernote.com/topic/31659-ocr-for-imported-pictures/?p=170832

Link to comment
  • Level 5

Since Doc-Ralf included the Evernote indexed PDF attachment in his original post, it is obvious that he had already sync'd his scanned documents up to Evernote and sync'd again to get the indexed OCR results.

Link to comment
  • Level 5*

I've done informal tests comparing Evernote OCR to my own with Adobe Acrobat Pro X, because I basically follow the same philosophy as jbenson2 and scan my own stuff. Evernote did pretty much as well, and in some cases better. I don't think it is Evernote's OCR that is the problem here. However, obviously something is wrong. 

http://www.christopher-mayo.com/?p=98

 

I'll put it into my account and see what I can find, but I wonder if it has to do with your language settings. Please check and make sure you have it set for Deutsch or Deutsch + English, and not just the English default. 

http://www.christopher-mayo.com/?p=85

Link to comment
  • Level 5

Thanks for the link, Grumpy.

 

Doc-Ralf and I did some private messaging last night and tested different scan results.

He did "the OCR directly in Doxie software, and that looks much, much better."

Link to comment
  • Level 5

BetzHack

 

I've run several scans today with my ScanSnap.

There has been no change in either the speed or quality compared with previous scans of similar documents.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...