Jump to content
foo

(Archived) OCR and pdf files

Recommended Posts

https://support.ever...?questionID=591

Can you please add the following information:

1. If a PDF that has already had OCR performed on it is uploaded to EN, does it still get OCR'ed by EN, or does EN just used the embedded text put there by what ever application OCR'ed it?

2. When a PDF with no OCR is uploaded to Evernote, what is the process for the searchable information then becoming available in the desktop application. I.e. Does EN for Window/Mac need to sync again to retrieve this information? How long does it take? How do you know when a PDF you have in EN has been OCR'd?

3. Once a PDF has been OCR'd. If you then copy/export that PDF from EN, does that OCR information stay with the PDF file?

Share this post


Link to post

Hi - I sort of replied to this in your other post, but to do the specifics..

  1. Pre-OCR'd files don't get processed by Evernote.
  2. When an OCR is carried out, the text is saved as a separate file - right click the image in the note and you should have the choice to download the OCR version. There's no option (that I'm aware of) to export a note or series of notes with text. The speed of an OCR depends on your membership type and the load on EN servers - think around 24 hours.
  3. See above. You can export the image file and the text file separately and manually. Not sure if there's an effective difference.

Can't say when or if Evernote will get around to updating the KB articles, but all of the above information has been discussed (some at length!) in the Forums.

  • Like 1

Share this post


Link to post

Thanks for you answer to those questions gazumped. Appriciated.

In your experience how have you found the OCR results Evernote preforms compared to the OCR such as ones built into scanning software (I used ScanSnap which I beleive uses the ABBYY Finereader engine)?

Share this post


Link to post

I don't have any inside information about ABBYY. A new folder turned up mysteriously on my hard drive after I installed Scansnap so I assumed the connection, but I also installed Adobe 9 as part of the bundle which confused me for a while. There's another thread around here where I queried why my filesizes went down after being Adobe OCR'd -despite the presumed additional OCR content- and it seemed that the likely answer was that Adobe was simply better at optimising PDF files than the native Scansnap software. I've not run any comparative tests between the two packages for text recognition - if I find any failings there I'll tweak Adobe or get some software to which I can teach specialised IT / Legal / insurance terms if necessary. For the moment, given the fairly high volumes I've been converting, it does my monthly limit good to use Adobe OCR'd files rather than anything else - I get a 10-25% reduction in synced file size.

Share this post


Link to post

I emailed a non-OCRed pdf file to my evernote account and when I brought it up in EN, I saw it performed OCR on it.

 

I copied a non-OCRed pdf file to my EN watched folder, and it did not do OCR on it.

 

Is there a lag time, or is OCR not supported in watched folders?

 

Will appreciate your input. Thanks.

Share this post


Link to post

Well that blew the cobwebs off an old thread..  It's always good to tell us what client you're using,  though since (AFAIK) Macs don't have "watched folders" I'm guessing this is Windows Desktop.

 

So.  If you add a PDF file to Evernote - within some size limitations - Evernote will do its best to OCR the content.  If you're a Premium user,  you get priority.  The rest of the world gets processed in a queue.  If you add a file in the desktop app,  the file will find its place in a queue as soon as your computer syncs to the net.  If you place a file in an import folder,  you have to wait a little longer for the file to be added,  then synced,  then OCR'd;  but it should still get done.  If you put a stop watch on it,  you're probably going to be disappointed.  If you need the OCR that bad - my recommendation is: do it yourself;  that's my routine practice anyway.

Share this post


Link to post

Hi - I sort of replied to this in your other post, but to do the specifics..

  • Pre-OCR'd files don't get processed by Evernote.
  • When an OCR is carried out, the text is saved as a separate file - right click the image in the note and you should have the choice to download the OCR version. There's no option (that I'm aware of) to export a note or series of notes with text. The speed of an OCR depends on your membership type and the load on EN servers - think around 24 hours.
  • See above. You can export the image file and the text file separately and manually. Not sure if there's an effective difference.
Can't say when or if Evernote will get around to updating the KB articles, but all of the above information has been discussed (some at length!) in the Forums.

 

So are you saying non-Premium members now get pdf's and images ocr'd, but at a later time than Premium users?  If so, that would be very nice, but it's been my understanding Premium was necessary for this benefit.

Share this post


Link to post

Try attaching an image or a PDF file and see what happens?

Share this post


Link to post

I did that last night, scanned pdf, and it is not searchable.  I was looking into complicated solutions, but if Evernote non-premium would search it, that would be ideal.

Share this post


Link to post

Oops- sorry;  memory fade.  My subconscious must have known I was wrong,  hence the weasel "try it and see".  Anyway - you get a huge amount more than you pay for with Evernote free;  and the cost of an upgrade to Premium is pretty minimal...

Share this post


Link to post
Guest
This topic is now closed to further replies.

×
×
  • Create New...