Jump to content

(Archived) Will images within PDFs be OCR-ed?


Recommended Posts

I can see that a text document that has been converted to PDF does not become a subject of text recognition in Evernote. But is it possible that this change? Many of my scanned documents are already saved as PDF for many good reasons. Downgrading all of these to jpg seems to be a difficult and a time-consuming step into the past.

Link to comment

I'm curious about this assurance, because I upgraded to Premium last week and precisely zero of my pdfs are searchable. I also put in a support request for this issue last week and have yet to hear back (and nothing's in my spam filter from you). I'd be very grateful for some help. Thanks so much.

Link to comment

mgarry -

We will perform OCR on scanned PDFs that are in your account if you are Premium. If you add PDFs that are already "searchable" (because they already contain text that you can search, select, and copy), we won't run any OCR, and we won't produce this alternate "searchable" version. Do you have PDFs in your account that contain text which you can't search for? Or are you just seeing that the "Save Searchable..." menu item is disabled in the client?

I checked our support tracking database, and did not see any inquiries containing 'mgarry'. How did you submit your inquiry? In the future, to send a request for support, go to the bottom of: http://www.evernote.com/about/contact/support/

If you use this, we can work with you more closely to identify what's happening with your PDFs.

Thanks

Link to comment
  • 2 weeks later...

I have a few questions regarding OCR/ICR functionality.

I am using 3.5.0.1258 and I am a premium account holder.

I have been playing OCR/ICR and trying to understand some requirements.

Is there any specific version of PDF you must scan at?

Does OCR/ICR work with documents that you scan and import, or only documents you email to the EN server? If it does work with scan and import, does it require a sync?

The little yellow boxes that come up when you have a "hit" on a scanned document, do not seem to appear on a PDF, only JPG's, is that normal?

I have tested email with both PDF and .jpg. JPG seems to work better with ICR. When I use PDF, I do not get any search results. When I look at the "PDF" option and save as a searchable PDF, and open that PDF, only garbage inside. Just a few funny characters. When I scan that same hand written note using a .jpg, it works (I can search on it.) I have tried typed documents in PDF, and they seem to work ok. Is there an ICR limitation with PDF?

Lastly, does this functionality really require the Premium account?

Link to comment

If you have a Premium account, we will process any scanned PDF that is added to your account and synchronized to the service. The processing happens on beefy servers with specialized software, not on your PC.

There's no particular version of the PDF spec required, although password-based encryption in your PDF will prevent us from reading the file.

The Windows client doesn't yet support search result highlighting with PDFs, but we'd like to add this in the future.

PDF processing is designed like traditional "OCR" ... it will work well with scanned printed text, but not handwriting. We process JPEG images differently, by assuming a lower input quality and producing more alternate possibilities for every word.

Link to comment
If you have a Premium account, we will process any scanned PDF that is added to your account and synchronized to the service. The processing happens on beefy servers with specialized software, not on your PC.

There's no particular version of the PDF spec required, although password-based encryption in your PDF will prevent us from reading the file.

The Windows client doesn't yet support search result highlighting with PDFs, but we'd like to add this in the future.

PDF processing is designed like traditional "OCR" ... it will work well with scanned printed text, but not handwriting. We process JPEG images differently, by assuming a lower input quality and producing more alternate possibilities for every word.

Can you clarify as to what you lose with regard to this topic if you do NOT have a premium account?

Link to comment

If you add a PDF to a Free account, we'll store that PDF, and you can always access it again later.

If the PDF already contains "searchable" text that you can find from within Acrobat or Preview, then that same text will be indexed and searchable within Evernote.

If your PDF is a scanned document, which is not searchable from Acrobat/Preview, then we'll just store that PDF without doing any OCR on it in Evernote until you upgrade to Premium. This means you can still read your scans, but you won't be able to find those notes by searching for words in the scanned documents.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...