Help with OCR solution

haakonf · January 20, 2012

I got my Fujitsu Scansnap S1300 today. My plan was to use is for scanning my mail and documents to Evernote. I was very excited about going paperless and all until I realised that the Scansnap software does not support OCR in my language. Norwegian, that is. Right now I am wondering if I should just return the scanner, or if there might be some way around the problem.

My first idea was to let Evernote do the OCR. However, Evernote does not support Norwegian text recognition at this point. So...

Say I upload my PDFs now and let Evernote scan them using a dictionary similar to Norwegian (such as Danish). And say Evernote gets Norwegian support one day. Will I then be able to reprocess all my PDFs on the Evernote servers? (In one operation?)
According to Evernote's public language availability chart, Norwegian OCR support is "waiting for web client". What does that mean? And what can desperate Norwegian Evernote users do to help speed up the process?
I notice that I can right-click a PDF in Evernote, open it in Acrobat, and OCR scan it from there. After saving, the (Norwegian) PDF in Evernote is now perfectly searchable. It works great. Is there any way I could batch process all the PDFs in my Evernote account using Acrobat, ABBYY or some other OCR software? (In that way I could run that process every once in a while.)
Do you know of any other Mac OCR software that can receive a file, process it and automatically transfer it to Evernote? (To achieve one-click scanning.)

Any help and suggestions would be much appreciated!

JMichaelTX · January 20, 2012

Have you checked PDF apps like Adobe Acrobat to determine if they support your language?

I have a ScanSnap but I use Acrobat to do the OCR before uploading to EN.

EDIT: Sorry, I didn't read for enough in your post.

I'm pretty sure Adobe has something like "Distiller" which does batch processing of PDFs.

Do a Google and/or check the Adobe web site.

haakonf · January 20, 2012

Thanks for your reply, JMichael. Sure, Acrobat handles Norwegian very well. Have you managed to set it up so that you can scan to Evernote via Acrobat with only the one click on the scanner?

Perhaps using a Apple script to create a watched folder would to the trick. I'm not certain about the reliability, though.

JMichaelTX · January 20, 2012

No, I don't have a "1-click" solution because I don't want one. :-)

After the doc is scanned, I always rename the PDF using a std naming convention, which can't be automated with any accuracy.

anjoschu · January 20, 2012

My experience with the scansnap's OCR is that it does a pretty decent job even with the "wrong" language selected. I scanned a bunch of german documents recently after scanning some russian documents, and had forgotten to change the language back to german. It still recognized the text fine.

My guess is that the language-specific information is only used to disambiguate in cases when the OCR process finds it hard to recognize a word. For me, it's certainly good enough to make the texts searchable.

I suggest you simply try scanning a couple of norwegian documents (e.g. with OCR for a not-too-similar language selected), then open the ocr'ed PDFs in Preview, copy-paste the text into TextEdit and look at how much the OCR got wrong. I think you'll find that results are quite good still.

Andrewbermudez · November 20, 2016

@haakonf great question, I've done a lot of work around this as I'm an expert when it comes to OCR in my productivity quests.

Are you a Mac OS user?

if you are, I've created an easy Automator script that I can share with you that will batch process all of your PDFs to OCR in Norwegian and then you can reimport them into Evernote by dragging and dropping.

Ping no me if I can help.

Help with OCR solution

Recommended Posts

haakonf 0

Link to comment

JMichaelTX 4,117

Link to comment

haakonf 0

Link to comment

JMichaelTX 4,117

Link to comment

anjoschu 67

Link to comment

Andrewbermudez 0

Link to comment

Archived

Community Resources