Jump to content
  • 0

(Archived) Scansnap OCR to pdf or let Evernote OCR it?


merk850

Idea

I have the new Scansnap S1300 and I am using it with the Scansnap Manager software on the Mac. My question is about using the supplied OCR software that came with the Fujitsu. Compared to the last 300m model this comes with ABBYY Fine Reader OCR. I am scanning with the Fujitsu software manager and it is going directly to Evernote. (cool!)

1) Should I run this OCR every time with each scan? It takes longer though because it is processing the OCR and then sending the OCR file to Evernote.

2) OR, should I just scan as PDF and send to Evernote and let Evernote process the OCR? (This is quicker)

3) What is the difference technically, do you think? Is Evernote running superior OCR? Will Evernote return that OCR'd PDF to my local client?

I do not mind paying the premium membership if this is somehow a better work flow.

I am just beginning my home scanning and I want to start the right work flow.

Thanks for any help!

Wayne

Link to comment

13 replies to this idea

Recommended Posts

If you don't mind waiting for your local PC to do the OCR, then you may find that it's more useful to let your own scanner do the OCR. Then you have local files that you can search from your hard drive if you wanted to use elsewhere, and you can select/copy/paste text directly from those PDFs.

Otherwise, you can upload them to Evernote, and if you have a Premium account, we'll process them within a few minutes. Then you can sync the results down to sync within Evernote.

Link to comment

Dave,

Thanks for the reply. A couple of follow ups:

Your recommendation to OCR locally says it will give a copy on my local drive to search and edit at will. However, my workflow is that I am using the Scansnap to scan "directly to Evernote". I guess that bypasses that step. What would be a recommended work flow? Do I scan to all different file structure folders first locally and then upload to Evernote? That seems like double the work.

OR, can I scan directly into Evernote and "retrieve" my PDF's easily from the client back to my desktop? Please speak specifically about how easy it is to get my original pdf data back to my drive. Can I drag and drop 50 or so documents at once? If not is this a feature that is important to others and will be implemented sometime?

It would seem a shame to have to keep a manual database of all different folder on my HD , drag them individually to EN, and have EN recreate that database online, AND AGAIN on my HD just within the EN application.

My intention is to *ease* my document usage not make it a side job.

Please advise and educate me on these issues above.

Also my original question #3 could you address about the OCR ability coming with a document OCR'd in EN that is subsequently downloaded locally. Is it OCR'd just as if I scanned it OCR. Is EN OCR as good or better than Fujitsus OCR ABBYY Fine reader?

Thanks for the info, as I want to go about this workflow the best way possible from the beginning.

Link to comment

Use local OCR. If you ever want to search the document from within Acrobat Reader or pass the file to someone else who doesn't have Evernote, they can still search it. The utility of that file is much greater if you have OCR within the local file.

The unique part of Evernote OCR is that it will track different possibilities for the same words. That's a terrific feature, but having the OCR in the file is still much better if you can take the time up front.

Link to comment

engberg - how fast does your local OCR work per page? what do you use?

i found this topic was double-posted somewhere else and i am reposting my response here for anyone who does not see the other thread:

i too have a scansnap although its the s510 white (for mac) and i dont really use the bundled OCR since i think it was on Adobe Acrobat 7 and i find it to be very slow.

however, on your question about the differences between evernote and local OCR, it seems that evernote uses a fuzzy OCR algorithm that is technically different from conventional OCR in that it can only be used for searching your data and not for extracting actual text. to use an example, the word 'cat' in handwriting can be interpreted as 'car', 'cap' and more depending on how unclear the text is when Evernote goes through it whereby local/conventional OCR such as adobe would treat it as one specific word even though it may be wrong.

to me this makes evernote superior but only because my primary use of OCR is for later search and retrieval and not for text extraction that i rarely need.

how long does the ABBYY OCR take to process a page or document? im guessing its faster than the adobe acrobat OCR process since its quite popular.

Link to comment

I have an older HP all-in-one that bundled some software for OCR, which is relatively slow. The ScanSnap models tend to be a bit quicker, however.

The "fuzzy matching" is mostly used for JPEG images, rather than PDFs. We assume a PDF scan is more likely to be a clean document from a scanner, and JPEGs are more likely to come from cameras.

If you're working with handwriting, this is why JPEGs frequently produce better results in Evernote than PDFs.

Link to comment

Maybe I'm missing something because in terms of time and effort, having the EN servers do the OCR is MUCH better...

I'm a premium member and for me there just isn't any comparison. I've tried scanning my PDFs and having Abby do the OCR and it takes a while. I found it difficult to batch and the computer wasn't as snappy while processing an OCR. Not to mention, the export to EN just wasn't as clean.

Now I just open my scanner, punch the blue button and rip through 100 pages in just a few minutes (several different documents, natch) and watch EN upload to the servers. A few minutes later, I get a Growl notification that my PDFs have been updated and I can now enjoy my searchable PDFs without having to go through the pain and time of doing it myself.

Subscribe to the Premium for a month and give it a shot.

Kevin

Link to comment
The "fuzzy matching" is mostly used for JPEG images, rather than PDFs. We assume a PDF scan is more likely to be a clean document from a scanner, and JPEGs are more likely to come from cameras.

If you're working with handwriting, this is why JPEGs frequently produce better results in Evernote than PDFs.

Dave: well now that i know that its going to add another decision to my note taking process.

as an example: i make extensive use of jotnot for snapping documents, text on walls, screens etc and it does a good job of turning it into black and white with clear text although there isn't any OCR at this level. the next step involves:

(1) either emailing the note to evernote and this jotnot feature gives me the format option (PDF, JPEG, others, etc) and i always go for PDF since i find it to be the most useful format for me.

or

(2) using the jotnot evernote integration. this has become my favorite since the bugs were ironed out a few months ago and i use it almost all the time vs. the first option. i am assuming the images get sent to EN as jpegs since it does not provide me with an option before creating the note.

also what you seem to suggest is that a handwritten document that i pass through the scansnap and gets sent as a PDF to evernote may not be as accurately searchable as if i sent it as a jpeg.

quite interesting and useful to know i guess.

thanks

Link to comment

Evernote's image/text processing capabilities are intended to allow you to find the original document (image or PDF) within Evernote itself. We're not really providing a generalized OCR service for converting scans to text documents, although if that works for you in some cases, that's great.

So if you are putting anything with handwriting into Evernote, and you want to search for that handwriting later, you'll get best results using the JPEG file format.

If you are putting a multi-page document into Evernote, with good quality scanned printed text, then a PDF may be more convenient. (Plus, it may allow you to do some text extraction later, although that's more of a side effect...)

Link to comment
Maybe I'm missing something because in terms of time and effort, having the EN servers do the OCR is MUCH better...

I'm a premium member and for me there just isn't any comparison. I've tried scanning my PDFs and having Abby do the OCR and it takes a while. I found it difficult to batch and the computer wasn't as snappy while processing an OCR. Not to mention, the export to EN just wasn't as clean.

Now I just open my scanner, punch the blue button and rip through 100 pages in just a few minutes (several different documents, natch) and watch EN upload to the servers. A few minutes later, I get a Growl notification that my PDFs have been updated and I can now enjoy my searchable PDFs without having to go through the pain and time of doing it myself.

Subscribe to the Premium for a month and give it a shot.

Kevin

OK I am understanding more but just a little confused. My primary purpose as well is for document storage an retrieval. I have been trying to tag EVERYTHING and put them in notebooks as well. I do not mind doing local OCR with my Scansnap but as Kevin mentioned is it really necessary? Could I just scan and upload and let EN perform OCR for my retrieval, and then in the future if I relocated pdf and needed further more local OCR download the pdf and OCR it myself? Would I be loosing quality of some sort?

I guess I was looking for a PRO/CON list of local OCR vs. EN OCR.

Thanks for the input.

PS If the word "pizza" is in a document and I search in EN will that find that document differently in EN ocr and LOCAL ocr? Does EN search Titles, Tags, and ALL text? It would seem to me that after a few hundred uploads searches could be overcrowded with common words. Still trying to understand what exactly it does. maybe I am over thinking this!

Wayne

Link to comment

OK I am understanding more but just a little confused. My primary purpose as well is for document storage an retrieval. I have been trying to tag EVERYTHING and put them in notebooks as well. I do not mind doing local OCR with my Scansnap but as Kevin mentioned is it really necessary? Could I just scan and upload and let EN perform OCR for my retrieval, and then in the future if I relocated pdf and needed further more local OCR download the pdf and OCR it myself? Would I be loosing quality of some sort?

I guess I was looking for a PRO/CON list of local OCR vs. EN OCR.

Thanks for the input.

PS If the word "pizza" is in a document and I search in EN will that find that document differently in EN ocr and LOCAL ocr? Does EN search Titles, Tags, and ALL text? It would seem to me that after a few hundred uploads searches could be overcrowded with common words. Still trying to understand what exactly it does. maybe I am over thinking this!

Wayne

I think the EN OCR will work very well for your purposes as you are describing EXACTLY how I use the service. You are correct, I can and do get some pretty significant lists of files that match common words. Using tags helps (I usually tag each of my notes with 3-5 tags) as does typing in multiple search terms. Additionally, I use about five different notebooks and generally know what notebook I'll store a file. Using selected notebooks and tags, I can usually narrow down my search results to less than 5 notes.

Don't bother doing the OCR on your computer unless you just like killing time. Give the EN OCR a shot.

Kevin

Link to comment

I am confused because sometimes I am getting searchable docs and sometimes not...I don't think I am doing anything different...Using ScanSnap 1500 for Mac...I've let the docs sync to web, too..I scan into an Evernote notebook. Would love for someone to figure out what I'm doing wrong.

Thanks.

barb

Link to comment
I am confused because sometimes I am getting searchable docs and sometimes not...I don't think I am doing anything different...Using ScanSnap 1500 for Mac...I've let the docs sync to web, too..I scan into an Evernote notebook. Would love for someone to figure out what I'm doing wrong.

Thanks.

barb

As far as I know, whenever you upload a PDF to the Evernote servers, they will create a OCR'd PDF for you as long as you are a premium member. When exporting from EN to your Mac, you have the option of exporting a PDF or an OCRd PDF.

That may be your problem.

Kevin

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...