Jump to content

OCR on Uploaded PDF - How To Verify If Indexed/Searchable


Recommended Posts

I use Evernote for remembering various things, but the majority is for scanning of my mail using scansnap. I usually do not have a problem finding past mail, but today was searching and found that for a sample of documents very few were found using some common key words.

I then spotted the icon which has the tooltip in the title ("not all resources on this note are indexed for searching"). It seems that the majority of my posts from scansnap have this and now, although some have some keywords which are found and others seem to not find anything.

So my questions are:

1. What does this message actually mean?

2. Is there a way to force reindexing? (I'm sure I saw this in a previous client, but don't seem to find it now in the mac Version 2.0 (116546))

3. Is this something to do with using a scansnap? (although most PDFs downloaded from the web also have the same)

4. Should I have OCR switched on or off in the scansnap settings, and what are the optimum settings for scansnap? (I am scanning directly to evernote from an S1300)

Thanks,

Richard

Link to comment
  • 3 weeks later...

I have had the exact same issue.

I just realized that many PDFs over the last 2 years had not been indexed. I also use scansnap. It's random - some got indexed, many did not.

I filed a support ticket (two actually) and they could not give me an explanation as to why this happened, nor what I can do to correct it. I've been a preium member and ALL of my PDFs should have been indexed. The PDFs are all well under the max size, and I've never gone over my bandwidth. Now I've got 2300 notes, and who knows how many of them haven't been indexed properly.

There's no way to search my notes to see which PDFs haven't been indexed. There's no option to reindex. I'm at a loss.

VERY FRUSTRATED!

Michael

Link to comment
  • Level 5

I also have run into inconsistencies with the Evernote OCR process.

Due to the importance of finding my information at a later date, I set my scanner to perform the OCR on all of my documents before sending it to Evernote. It takes 30 seconds longer, but I am confident that the OCR process has been done correctly.

It would be interesting to find out what types of documents or images cause Evernote to state "not all resources on this note are indexed for searching".

The word indexed does not appear in the Knowledgebase search. The word index shows up, but does not address this issue.

Link to comment

While looking at the S300 specifications, I noticed this little tidbit in the fine print at the bottom - it might hold a clue to your OCR issues...

"Operates most effectively with Windows Vista™ compatible versions of Adobe Acrobat. Use with non-Windows Vista™ compatible versions of Adobe Acrobat might result in the generation of PDF files that cannot be searched."

Link to comment
I just realized that many PDFs over the last 2 years had not been indexed. [...] It's random - some got indexed, many did not. [...] I've been a preium member and ALL of my PDFs should have been indexed [...] and who knows how many of them haven't been indexed properly.

There's no way to search my notes to see which PDFs haven't been indexed. There's no option to reindex. I'm at a loss.

same here.

if you have images AND pdf in the same note, you don't know, which one are indexed and which not, unless you spend/waste your time testing it.

While looking at the S300 specifications, I noticed this little tidbit in the fine print at the bottom - it might hold a clue to your OCR issues...

that, i did not understand.

what does it tell me? or better: what should it tell me?

Link to comment
  • 2 months later...

Same problem here. BIG issue - how would I find old notes? I have not tagged or named all my notes, assuming I can always find them through Search.

It appears to me that this affects only notes that I have sent to my evernote email address. Can anybody confirm this?

Evernote people - any statement from your end?

Cheers

Wolf

Link to comment
  • Level 5*

Some of the PDF software that I use offers the option to save in compatibility with various version of the PDF standard starting from 1.4 and running up to (I think) 1.9

Where there's been a choice I've used the oldest option (1.4) on the basis that this should be compatible with everything, but maybe some scanning software does save in a more recent whizzy 1.x version that isn't reliably OCR'd

Some of my scanning software gives the option to scan to a searchable (ie pre-OCR'd) PDF file, which I assume means that Evernote has only to index the contents which may be more reliable.

Having said all of which I haven't seen any of my 3000+ notes remaining unsearchable as yet.

If you're in doubt, why not make up 25 password-style "words" that you can list securely and scan the sheet every so often. Keep a note of when and how you scan it. If you search for "myzzlpick" and only get 6 hits when you know you scanned it 8 times, you can go back to the exact dates and times to find out what went wrong. This would be a reassurance to you if it works 100%, and give the support team something to work with if it doesn't.

Link to comment
  • Level 5*
doesnt scanning your own PDF's makes a mockery of using Evernote?

on my case it is JPEG files which I have sent which have not been scanned.

Sorry - 2nd post here mentions PDFs and I misconcluded from there. Anyway: I don't see that I'm stealing the bread from anyone's mouth by doing a little work up front - but I'm just feeling my way to a perfect system, so only just found out that it is possible to make a searchable PDF as opposed to an image-only one.

As regards JPEGs I believe they should be OCR'd pretty quickly after syncing with the server (seen it here, somewhere) but the quality of the image is a factor - are yours 96dpi or less (display standard), 150-ish (my scanner default) or 300+ (print quality)? If they're not documents, is the text in focus and more or less level?

If you can find one that hasn't been OCR'd, why not upload it again to see what happens?

Link to comment
  • 2 years later...

Hi.

 

I'm an Evernote newbie (and Premium User).

 

I uploaded a few PDFs to Evernote.  2 of them were uploaded using the Evernote desktop software for Windows.  The 3rd one was uploaded using the Evernote website.  I created the PDFs myself using a Canon MF3010 scanner using the "Scan to PDF" option.

 

On the Evernote desktop client for Windows, when I select the note and click the "Info" button -- the attachment status on all 3 of the notes says "1 PDF  has not been indexed".  I'm interpreting that OCR has not been performed and it has not been made searchable.

 

The PDFs appear to meet all of these requirements for Evernote to perform OCR...

 

"You must be a Premium subscriber, the raw PDF must be 50 megabytes or less, the scan must contain no more than 100 pages, the raw PDF cannot already contain “searchable” text that you can select and copy, the PDF can’t be encrypted or protected with a passphrase, and the PDF cannot be a handwritten document."

 

The only requirement above that I'm not sure how to verify is that "raw PDF cannot already contain "searchable" text that you can select and copy."  When I open these in a PDF view (i.e., Adobe) I'm not able to select & copy any text in these PDFs.

 

I have two questions...

 

1) How do I determine if a PDF has been processed through Evernotes OCR and is now fully searchable?

2) Why do the 3 PDFs submitted still say "1 PDF has not been indexed" when they have been uploaded for several days now?

 

 

Thanks,

 

G

Link to comment
  • Level 5*

Hi.  If your PDFs have been indexed that should show up in the attachment status.  Note that the 'has not been indexed' message also appears when you have already indexed the PDF yourself,  which caused me a mild panic when I first saw it.  I now OCR all my PDFs before they're uploaded - well mostly all - because it avoids uncertainties like this one,  and the PDF remains searchable if you need to download or copy it,  which is not the case when Evernote do it.

 

If your PDF meets all the other requirements,  I can only assume that the resolution my not be sufficient to allow the process to initiate - do you scan at 300dpi ?

Link to comment

Thanks for the reply!

 

I scan at 300dpi.  In the PDF settings for my scanning software for 2 of the 3 PDFs I uploaded, I scanned them w/ the "Create Searchable PDF" option selected.  I removed that option for the 3rd PDF.  Given your explanation above, then I would have expected that 2 of the 3 PDFs I uploaded would show not indexed, but the 3rd should have been indexed.

 

The only other setting I have control over in the scanner software is the PDF Compression level.  It is currently defaulted to "High".  My other option is "standard".  I'm not sure if that makes a difference or not.

 

- G

Link to comment

I created a new PDF w/ the lower compression setting and uploaded it to Evernote last night.  I checked on it this evening and Evernote's desktop app still says that it has not been indexed.  ???

 

I guess I'll just need to open a support ticket w/ Evernote and ask them to help me troubleshoot this...  I'm not sure what else to try...

 

- G

Link to comment
  • Level 5

I created a new PDF w/ the lower compression setting and uploaded it to Evernote last night.  I checked on it this evening and Evernote's desktop app still says that it has not been indexed.  ???

 

I guess I'll just need to open a support ticket w/ Evernote and ask them to help me troubleshoot this...  I'm not sure what else to try...

 

- G

 

Please let us know what Evernote Support's response is. I often have wondered why the Evernote OCR status is not made more obvious to the customer.

 

 

I take a different perspective and perform the OCR while I am scanning the document before handing it over to Evernote.

 

Here are some reasons why I prefer to do the OCR with my ScanSnap scanner instead of letting Evernote do it.

http://discussion.evernote.com/topic/20080-evernote-pdf-ocr-vs-snapscan-ocr/#entry101003

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...