Jump to content

Snapscan's OCR vs. Acrobat's OCR


idoc

Recommended Posts

I know that this topic has been hammered to death in various places in the forum but it may still have some relevance. I've recently been scanning fairly large amounts of data into EN and have been playing a bit with the settings. I carried out the following experiment: I scanned 3 different 20 page documents using the same settings in every case (automatic resolution,B&W, duplex). The only thing that I varied was whether snapscan would do the OCR or whether Acrobat would do it. In every case I optimized the final document with Acrobat to reduce file sizes. It's not surprising that in all the scans there was not a very significant difference in the size of the files ie: scapscan OCR'g or Acrobat OCR'g produces the same size files. However, there was a noticeable difference in the quality of the scans. For some reason, the snapscan OCR'd material looked much better than the Acrobat OCR's material. Once again, I should mention that all of the documents were "optimized" with Acrobat. As expected, before the optimization was done all documents looked identical regardless of whether they were OCR'd by snapscan or Acrobat.

The conclusion: In terms of how long it takes, quality of output or ultimate file size there is no difference between OCR'g with snapscan or with Acrobat. However, if you plan on optimizing your files with Acrobat (to reduce size) you will get a better quality output if the OCR'g was done with snapscan. I can't explain these results but I'm pretty confident that the conclusion is correct.

Link to comment
  • Level 5*

Thank you for doing the tests and reporting on them. I have not done any such test, but my understanding is that ScanSnap uses Acrobat to create PDF documents and do OCR on them, So, I would say you tested two versions of Acrobat (this depends on whether you are using Mac or Windows -- Fujitsu uses different versions in each). The difference you see is likely a result of your optimization settings in Adobe. I could be wrong about this, of course, but this would be my guess.

Link to comment
  • 1 month later...

From what I understand, Adobe Acrobat's OCR is handled internally (and can also be tweaked to your liking), while ScanSnap's uses AABBY's (http://www.abbyy.com/) engine, via the "Scan to..." shortcuts.

For optimal scanning (regardless of the scanner), you should be bumping up the scans to 600 dpi. That is the sweet spot needed for the best character recognition by the OCR engine and file sizes that aren't too large.

I hope that helps!

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...