Searching inside scanned PDF's

doctorkeo · March 8, 2023

I've scanned in some documents using Acrobat and inserted them into a note.

I've used Adobe's reader and searched inside these documents and found the search word and phrases, but they aren't found when I do the same searches via EN. These notes were uploaded to EN more than weeks before. Am I expecting too much of EN?

Doc

Edited March 8, 2023 by doctorkeo
Extra info

gazumped · March 8, 2023

In addition to any scanning delays, Evernote has a set of OCR limitations which might be affecting you - see if any of these apply... https://help.evernote.com/hc/en-us/articles/208313388

agsteele · March 8, 2023

Just to note that the references to Premium and Business in the page @gazumped mentions now mean Personal, Professional and Teams. Also Local Notebooks are no longer a feature of Evernote v10.

PinkElephant · March 8, 2023

How was the search index for these documents created ? When there is an embedded text layer, EN will not index it again.

Independently from the origin of the OCR, the search index will be created for both. But if you export a pdf, only the embedded text will follow. OCR done by EN only works inside of the app.

doctorkeo · March 9, 2023

13 hours ago, gazumped said:

In addition to any scanning delays, Evernote has a set of OCR limitations which might be affecting you - see if any of these apply... https://help.evernote.com/hc/en-us/articles/208313388

Thanks, but I really can't understand point 2 - all other points the PDF qualifies

Acrobat creates a searchable PDF - where I assume there is a text layer, and I would assume that Adobe has the best OCR system. Is that saying that Evernote does not index the text embedded within the PDF? Surely that can't be correct - a great version of OCR is ignored - Am I just looking at this incorrectly?

Doc

I do have a premium subscription.

Scanned PDFs with clear, typed text (handwriting is not searchable inside PDFs)
PDFs that do not already contain text that you can select or copy
PDFs with at least one page with a small image (1025 pixels of image data)
PDFs less than 100 pages long
PDFs less than 25 MB in size
PDFs that are not password-protected
PDFs that are not corrupted or unreadable

Edited March 9, 2023 by doctorkeo
Update

agsteele · March 9, 2023

I think it means exactly what it says. If the PDF has already been processed by Adobe or whatever then Evernote will not/cannot perform further character recognition.

PinkElephant · March 9, 2023

My scans done with my ix500 are OCRed by Abby Fine Reader. They have a text layer. After importing the notes show up in searches, and are searchable with the EN tools inside of the pdf, including highlighting the hits.

If a pdf doesn’t show up in search, I would use a different client. Maybe the problem is with the local database. You can as well contact support.

doctorkeo · March 23, 2023

The problem got escalated two days ago (20 March 2023), and the tech support was able to reproduce the error / no find of multiple words inside PDF's from data supplied and informed me that they were working on it. They suggest using the Legacy version while this is being investigated.

I did think it could have been the way I produced my PDF's, which would be surprising as I use Adobe's Acrobat, but as the tech support says the Legacy version doesn't have the same problem, I think that can't be the case.

Just waiting for a solution

Doc

Searching inside scanned PDF's

Recommended Posts

doctorkeo 25

Link to comment

gazumped 12,057

Link to comment

agsteele 3,059

Link to comment

PinkElephant 8,777

Link to comment

doctorkeo 25

Link to comment

agsteele 3,059

Link to comment

PinkElephant 8,777

Link to comment

doctorkeo 25

Link to comment

Create an account or sign in to comment

Create an account

Sign in

Community Resources