Jump to content
  • 0

A problem with PDF indexing - how to make Evernote rescan my PDFs?


Kishi

Idea

I’ve started using Evernote around 10 years ago, drawn by the promise of its robust search functionality. Particularly, the ability to create searchable PDFs out of multilingual scanned files. At the time I was using OCR software, but it was dealing poorly with documents containing multiple languages, never mind things like relatively old Japanese printed materials (pre or post-war, we are not talking ancient history here). The Evernote was just perfect in what it could do - to the point I disabled all the PDF indexing options my scanning software had and deferred it all to Evernote. The results were just much much better, without putting any workload on me whatsoever.

 

Since the pandemic started, I’ve continued to put new research material in my Evernote, but I had far fewer occasions to actually pull that data for the interpretation/translation work I do… so I haven’t noticed my PDFs are not being processed by the OCR software until very recently. I have some free time in August and I have to fix the issue somehow. I know that a lot of my PDFs going as far back as 2019 have not been processed.
 

It is a weird situation where they do pop up in search results, suggesting that the Evernote has indexed them, but at the same time the search terms are not being highlighted within the PDFs themselves, so the OCR data has not been put back in the PDFs. Some older PDFs seem to be processed as expected (they both show in the results and show the highlights).
 

I now need to get all my notes containing PDFs (relatively simple search, but the client seems to have a limit on number of items it shows in the results) and either force Evernote to re-scan them and OCR them properly, or I need to somehow batch-process them with OCR software on my desktop - I know Evernote can allow external app to edit attached files, but I’m not sure how to automate the process with the current edition.

 

I’m using Mac OS X 11.5 and the newest version of Evernote client. I’ve been premium user for years and that puts me on Personal plan right now, I believe.

 

My assumption that all the PDFs should be OCRed and made searchable by Evernote is mostly based on this article:

https://help.evernote.com/hc/en-us/articles/208313388-Tips-for-searching-scanned-PDFs

and my own experience from the past, when all the PDFs I’d put in Evernote would come up in the searches, with relevant terms highlighted. I’m sure that the PDFs I’m having problems with, in great majority, do not exceed limits described in the above-linked article. 
 

Is there anything I can do to fix my library of PDFs? Or should I start looking for a new home for them?

Link to comment

4 replies to this idea

Recommended Posts

  • 0
  • Level 5

If you are on v10, highlighting may not work as before. If the note containing the pdf shows in search results, it was OCRed and is searchable.

How to highlight the search hits inside of a pdf depends on the EN client you are using. Desktop does it different from mobile.

Why i in most cases use external OCR is that I can extract text, not only search it. Maybe results would be better today, and many software solutions allow to preselect the language to improve results. It works even with the scanner app on my iPhone.

Link to comment
  • 0
5 minutes ago, PinkElephant said:

If you are on v10, highlighting may not work as before. If the note containing the pdf shows in search results, it was OCRed and is searchable.

This is where I am the most confused. I have plenty of old PDFs I scanned directly to Evernote, which show properly and the search results are being highlighted. When I open them in another PDF viewer it too is capable of finding the text within these documents. Therefore they must be fully searchable

This is not the case with the recently scanned PDFs - they pop up in the search results, but the highlights do not show. I get no results if I try to search within the note itself and if I open such a PDF with external software, it isn't capable of finding search terms either (i.e. they behave as if they are not searchable, but they still show up on the list of notes containing the query in the general search).

Interestingly enough images containing text do not have that problem - they still show all the highlights properly just as they always had and are fully searchable.

 

There's also another possible bug with the Mac desktop client. The iOS client is showing the highlights properly (where the PDF or the image are searchable of course, so the bug with suddenly unsearchable PDFs is still there), but the Mac OS Desktop client fails to properly highlight some PDFs - even if these are the "old" searchable PDFs . The vertical right-to-left Japanese prints are affected by this. I have to report it, but that is an unrelated bug I think.

  • Thanks 1
Link to comment
  • 0
  • Level 5

Hmmmm - I am not really familiar with the specifics of the Japanese writing. 

What I did when I had a search problem (not related to what you describe): I prepared some examples of each case, and issued a support ticket.

With a problem like this you will need the technical support, not only the 1st level clerks. If you can’t get it there, send me a DM through the forum function with the ticket number, and I will flag it for you.

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...