Jump to content
idoc

paperless Snapscan's OCR vs. Acrobat's OCR

Recommended Posts

I know that this topic has been hammered to death in various places in the forum but it may still have some relevance. I've recently been scanning fairly large amounts of data into EN and have been playing a bit with the settings. I carried out the following experiment: I scanned 3 different 20 page documents using the same settings in every case (automatic resolution,B&W, duplex). The only thing that I varied was whether snapscan would do the OCR or whether Acrobat would do it. In every case I optimized the final document with Acrobat to reduce file sizes. It's not surprising that in all the scans there was not a very significant difference in the size of the files ie: scapscan OCR'g or Acrobat OCR'g produces the same size files. However, there was a noticeable difference in the quality of the scans. For some reason, the snapscan OCR'd material looked much better than the Acrobat OCR's material. Once again, I should mention that all of the documents were "optimized" with Acrobat. As expected, before the optimization was done all documents looked identical regardless of whether they were OCR'd by snapscan or Acrobat.

The conclusion: In terms of how long it takes, quality of output or ultimate file size there is no difference between OCR'g with snapscan or with Acrobat. However, if you plan on optimizing your files with Acrobat (to reduce size) you will get a better quality output if the OCR'g was done with snapscan. I can't explain these results but I'm pretty confident that the conclusion is correct.

Share this post


Link to post

Thank you for doing the tests and reporting on them. I have not done any such test, but my understanding is that ScanSnap uses Acrobat to create PDF documents and do OCR on them, So, I would say you tested two versions of Acrobat (this depends on whether you are using Mac or Windows -- Fujitsu uses different versions in each). The difference you see is likely a result of your optimization settings in Adobe. I could be wrong about this, of course, but this would be my guess.

Share this post


Link to post

From what I understand, Adobe Acrobat's OCR is handled internally (and can also be tweaked to your liking), while ScanSnap's uses AABBY's (http://www.abbyy.com/) engine, via the "Scan to..." shortcuts.

For optimal scanning (regardless of the scanner), you should be bumping up the scans to 600 dpi. That is the sweet spot needed for the best character recognition by the OCR engine and file sizes that aren't too large.

I hope that helps!

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×
×
  • Create New...