Jump to content

(Archived) ScanSnap generated PDFs not searchable in Evernote


Recommended Posts

I just setup a Fujitsu ScanSnap s1300. I setup Evernote as a profile and I am able to upload PDFs without problems. Quality is adequate and should be OCR'able but they will not index. I have sent up other PDFs that weren't scanned on the Scansnap and they index just fine. Now if I use the OCR function within Scansnap software and upload that version (Same quality settings) it is searchable in Evernote. Shouldn't Evernote be able to OCR a non-searchable PDF? Since running the OCR function on the scanner software triples the scan speed I would like to eliminate that from the workflow. Any insights would be appreciated.

Thank you.

Link to comment
  • Level 5

I have an older model - ScanSnap S300.

What is the criteria you are using to determine that Evernote is not OCR'g the PDF?

If you are a Premium user the OCR process usually takes a couple minutes - tops.

Link to comment

Any PDF I upload that I haven't scanned with the Scansnap is searchable within 5 minutes or less. I've uploaded 6 Scansnap scanned PDFs and none of them are searchable (it's been 2+ hours since I uploaded the last one)

As another test I uploaded a PDF that wasn't OCR'ed by the Scansnap software and none of the words are searchable. I then uploaded a 'pre-OCRed' by Scansnap version with the same image quality and it's searchable almost immediately. Links to the two files in Dropbox:

http://dl.dropbox.com/u/123657/Scansnap%20test%20file%20Pre%20OCRed.pdf

http://dl.dropbox.com/u/123657/scansnap%20test%20file%20No%20OCR.pdf

Link to comment
  • Level 5

Evernote servers ran into a problem on Sunday and they are still working on recovery.

Probably not a reliable test of the OCR process if any of the documents were uploaded between Sunday and now.

Link to comment
  • Level 5

It appears the problem with the Evernote servers has been fixed. I loaded the non-OCR on to my Evernote and waited for the OCR process to finish.

I have always let ScanSnap do the OCR and it looks wonderful.

I cannot say the same for Evernote's OCR.

Here is the original (non-searchable) version (just a partial screen capture)

post-16734-131906069389_thumb.png

Link to comment

The delay is still 6+ hours for my docs to get scanned. FYI, here's a response from Evernote support that lists some very specific rules about PDF scanning:

Sorry for any confusion. We will only try to OCR a PDF document if all of the following are true:

1) The raw PDF is 25MB or less

2) The scan contains no more than 100 pages

3) The raw PDF doesn't already contain "searchable" text that you can select and copy

4) The PDF isn't "encrypted" with a passphrase

5) When we try to analyze the PDF to determine #2 and #3, our software doesn't find a fault with the PDFs, non-standard markings etc.

6) PDF indexing does not work on hand writing, only clear typed words with high quality scans.

Link to comment
  • Level 5
And here is what I got from the Evernote OCR version - Yecchhh!

Maybe I'm doing something wrong to make this look so very bad.

o.0 How did you get the searchable OCR text-only view in Evernote? :o

I am not 100% sure this is the way Evernote expects users to get the searchable OCR - what I did was:

  • 1.) Right clicked on the thumbnail image
    2.) Selected Save Searchable PDF

I guess the reason for this 2nd version is to add searchability, but not to be used as a normal searchable PDF.

For the past year, all my ScanSnap PDF's have been OCR'd before going into Evernote. The additional 30 seconds to run the OCR is no big deal to me.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...