Jump to content

Searching inside scanned PDF's


Recommended Posts

I've scanned in some documents using Acrobat and inserted them into a note.

I've used Adobe's reader and searched inside these documents and found the search word and phrases, but they aren't found when I do the same searches via EN. These notes were uploaded to EN more than weeks before. Am I expecting too much of EN?

 

Doc

Edited by doctorkeo
Extra info
Link to comment
  • Level 5

How was the search index for these documents created ? When there is an embedded text layer, EN will not index it again.

Independently from the origin of the OCR, the search index will be created for both. But if you export a pdf, only the embedded text will follow. OCR done by EN only works inside of the app.

Link to comment
13 hours ago, gazumped said:

In addition to any scanning delays,  Evernote has a set of OCR limitations which might be affecting you - see if any of these apply... https://help.evernote.com/hc/en-us/articles/208313388

Thanks, but I really can't understand point 2 - all other points the PDF qualifies 

Acrobat creates a searchable PDF - where I assume there is a text layer, and I would assume that Adobe has the best OCR system. Is that saying that Evernote does not index the text embedded within the PDF? Surely that can't be correct - a great version of OCR is ignored - Am I just looking at this incorrectly?

Doc

I do have a premium subscription.

  • Scanned PDFs with clear, typed text (handwriting is not searchable inside PDFs)
  • PDFs that do not already contain text that you can select or copy
  • PDFs with at least one page with a small image (1025 pixels of image data)
  • PDFs less than 100 pages long
  • PDFs less than 25 MB in size
  • PDFs that are not password-protected
  • PDFs that are not corrupted or unreadable
Edited by doctorkeo
Update
Link to comment
  • Evernote Expert

I think it means exactly what it says. If the PDF has already been processed by Adobe or whatever then Evernote will not/cannot perform further character recognition.

Link to comment
  • Level 5

My scans done with my ix500 are OCRed by Abby Fine Reader. They have a text layer. After importing the notes show up in searches, and are searchable with the EN tools inside of the pdf, including highlighting the hits.

If a pdf doesn’t show up in search, I would use a different client. Maybe the problem is with the local database. You can as well contact support.

Link to comment
  • 2 weeks later...

The problem got escalated two days ago (20 March 2023), and the tech support was able to reproduce the error / no find of multiple words inside PDF's from data supplied and informed me that they were working on it. They suggest using the Legacy version while this is being investigated. 

I did think it could have been the way I produced my PDF's, which would be surprising as I use Adobe's Acrobat, but as the tech support says the Legacy version doesn't have the same problem, I think that can't be the case.

Just waiting for a solution

 

Doc

  • Like 1
Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...