Jpetrovski 8 Posted January 5, 2011 Posted January 5, 2011 I just setup a Fujitsu ScanSnap s1300. I setup Evernote as a profile and I am able to upload PDFs without problems. Quality is adequate and should be OCR'able but they will not index. I have sent up other PDFs that weren't scanned on the Scansnap and they index just fine. Now if I use the OCR function within Scansnap software and upload that version (Same quality settings) it is searchable in Evernote. Shouldn't Evernote be able to OCR a non-searchable PDF? Since running the OCR function on the scanner software triples the scan speed I would like to eliminate that from the workflow. Any insights would be appreciated.Thank you.
Level 5 jbenson2 2,149 Posted January 5, 2011 Level 5 Posted January 5, 2011 I have an older model - ScanSnap S300.What is the criteria you are using to determine that Evernote is not OCR'g the PDF?If you are a Premium user the OCR process usually takes a couple minutes - tops.
Jpetrovski 8 Posted January 5, 2011 Author Posted January 5, 2011 Any PDF I upload that I haven't scanned with the Scansnap is searchable within 5 minutes or less. I've uploaded 6 Scansnap scanned PDFs and none of them are searchable (it's been 2+ hours since I uploaded the last one)As another test I uploaded a PDF that wasn't OCR'ed by the Scansnap software and none of the words are searchable. I then uploaded a 'pre-OCRed' by Scansnap version with the same image quality and it's searchable almost immediately. Links to the two files in Dropbox:http://dl.dropbox.com/u/123657/Scansnap%20test%20file%20Pre%20OCRed.pdfhttp://dl.dropbox.com/u/123657/scansnap%20test%20file%20No%20OCR.pdf
Level 5 jbenson2 2,149 Posted January 5, 2011 Level 5 Posted January 5, 2011 Evernote servers ran into a problem on Sunday and they are still working on recovery. Probably not a reliable test of the OCR process if any of the documents were uploaded between Sunday and now.
Jpetrovski 8 Posted January 5, 2011 Author Posted January 5, 2011 That's a possiblity and that is why I uploaded 'control' pdfs to test. They indexed almost immediately
Level 5 jbenson2 2,149 Posted January 6, 2011 Level 5 Posted January 6, 2011 It appears the problem with the Evernote servers has been fixed. I loaded the non-OCR on to my Evernote and waited for the OCR process to finish. I have always let ScanSnap do the OCR and it looks wonderful. I cannot say the same for Evernote's OCR. Here is the original (non-searchable) version (just a partial screen capture)
Level 5 jbenson2 2,149 Posted January 6, 2011 Level 5 Posted January 6, 2011 And here is what I got from the Evernote OCR version - [b]Yecchhh! Maybe I'm doing something wrong to make this look so very bad.
Jpetrovski 8 Posted January 6, 2011 Author Posted January 6, 2011 Well after returning home I see that the pdfs in question have indexed. Mr. Benson I think you are right about the queues being backed up from this week's problems. Emails are currently missing from yesterday afternoon as well.
Jpetrovski 8 Posted January 6, 2011 Author Posted January 6, 2011 The delay is still 6+ hours for my docs to get scanned. FYI, here's a response from Evernote support that lists some very specific rules about PDF scanning: Sorry for any confusion. We will only try to OCR a PDF document if all of the following are true:1) The raw PDF is 25MB or less2) The scan contains no more than 100 pages3) The raw PDF doesn't already contain "searchable" text that you can select and copy4) The PDF isn't "encrypted" with a passphrase5) When we try to analyze the PDF to determine #2 and #3, our software doesn't find a fault with the PDFs, non-standard markings etc.6) PDF indexing does not work on hand writing, only clear typed words with high quality scans.
jfwarrior 6 Posted January 6, 2011 Posted January 6, 2011 And here is what I got from the Evernote OCR version - [b]Yecchhh!Maybe I'm doing something wrong to make this look so very bad. o.0 How did you get the searchable OCR text-only view in Evernote?
Level 5 jbenson2 2,149 Posted January 6, 2011 Level 5 Posted January 6, 2011 And here is what I got from the Evernote OCR version - Yecchhh!Maybe I'm doing something wrong to make this look so very bad. o.0 How did you get the searchable OCR text-only view in Evernote? I am not 100% sure this is the way Evernote expects users to get the searchable OCR - what I did was: 1.) Right clicked on the thumbnail image2.) Selected Save Searchable PDF I guess the reason for this 2nd version is to add searchability, but not to be used as a normal searchable PDF. For the past year, all my ScanSnap PDF's have been OCR'd before going into Evernote. The additional 30 seconds to run the OCR is no big deal to me.
Recommended Posts
Archived
This topic is now archived and is closed to further replies.