Jump to content
tijsterman

paperless Fujitsu Scansnap: possible to add OCR later? How?

Recommended Posts

Hello all

As there are quite a lot of Fujitsu scansnap users on this forum, I hope somebody can answer my question.

I use my Scansnap S1300 to scan documents into PDF's into Evernote both with and without adding OCR. The advantage of adding OCR are evident but for large piles of paper it takes quite some extra time. Often, I doubt whether I really need OCR for a particular document. It would be very helpful if there was a way to add OCR at a later date when it turns out that I indeed want to search the text or copy paste fragments of it. Does anybody know whether this is possible and how? Many thanks.

Share this post


Link to post

So given where we are, I'd be remiss if i didn't mention that Evernote Premium will OCR your PDFs for you when you upload them, but I assume what you are wanting is to have the actual PDF OCRed independently of Evernote.

If you are on a Mac, you can drag the PDF onto the ABBYY FineReader icon in the Finder. I am on my iPad so this is from memory, but I believe it is under /Applications/ScanSnap/Scan to Searchable PDF or something like that.

If you are on Windows, I don't think that works, but what you can do is throw your ScanSnap-scanned PDFs into the ScanSnap Organizer software that came with your scanner. It can then OCR them for you.

There are of course a zillion other applications that will do this for you, but I am just pointing out some ways to do it with the software that came with your scanner.

  • Like 2

Share this post


Link to post

So given where we are, I'd be remiss if i didn't mention that Evernote Premium will OCR your PDFs for you when you upload them, but I assume what you are wanting is to have the actual PDF OCRed independently of Evernote.

If you are on a Mac, you can drag the PDF onto the ABBYY FineReader icon in the Finder. I am on my iPad so this is from memory, but I believe it is under /Applications/ScanSnap/Scan to Searchable PDF or something like that.

If you are on Windows, I don't think that works, but what you can do is throw your ScanSnap-scanned PDFs into the ScanSnap Organizer software that came with your scanner. It can then OCR them for you.

There are of course a zillion other applications that will do this for you, but I am just pointing out some ways to do it with the software that came with your scanner.

Thanks for your competent answer. This was helpful to me.

Share this post


Link to post
If you are on Windows, I don't think that works, but what you can do is throw your ScanSnap-scanned PDFs into the ScanSnap Organizer software that came with your scanner. It can then OCR them for you.

This was the magic nugget... thanks.

When I bought my ScanSnap it came with an Adobe product which will OCR also, but I've not yet tried it out fully. I know it can work, but I haven't tried saving the results to see if it embeds the OCR'd info in the pdf or not.

Share this post


Link to post

So, I agree that Scanning with ScanSnap and letting it OCR at the same time takes awhile. However, after scanning your pdf into a note, right click on the note and open with Adobe Acrobat. Run the OCR command, and close the note and it will automatically save the OCR'ed version over the regular version in your note.

That's what I do.

Share this post


Link to post

So, I agree that Scanning with ScanSnap and letting it OCR at the same time takes awhile. However, after scanning your pdf into a note, right click on the note and open with Adobe Acrobat. Run the OCR command, and close the note and it will automatically save the OCR'ed version over the regular version in your note.

That's what I do.

So I understand your process, the PDF is scanned into an Evernote note first; and then you right click on the PDF located in the Evernote program and click "open with Adobe Acrobat"?

Share this post


Link to post

My work flow includes the use of the following tools:

  • Fujitsu ScanSnap S1500M
  • Adobe Acrobat 9 Pro for Mac (included with scanner)
  • Evernote (free version)

1. Scan to high quality PDF

2. OCR in Acrobat Pro

3. Optimize in Acrobat Pro

4. Save in a single folder on local hard drive titled something like "OCR & Evernote"

5. Copy to Evernote and sync

File sizes are 1/10 or smaller of the original size and the Acrobat OCR is superior to ABBYY FineReader for Mac and PDF OCR X per my trials. Also, Acrobat 9 Pro for Mac works great on my Mac running Lion (10.7.3). I also like to see the archived PDF's in the original color schemes so I scan using "auto color detection".

Currently I use the free version of Evernote, though despite my initial reservation, Evernote functions great, so I'll likely upgrade to the "yearly" option. I initially hesitated to install Acrobat 9 Pro on my Mac due to the large amount of negative net chatter that I read. Instead I spent hours trying out various other OCR options. In the end, I installed Acrobat 9 Pro and found it works better than the others that I tried. It's a robust program and comes free with the ScanSnap S1500M.

One of the ways Acrobat 9 Pro is better than others that I've tried is its ability to OCR documents with multiple and inconsistent formatting. For example, some of my utility bills have headers with typical "to" and "from" info, usage charts that run the width of the document, narrow columns on one side of the document, additional tables of varying columns and rows and then paragraphs. This is all on one standard letter size document. Acrobat 9 Pro OCR'd 99% of the text correctly, including "$", "#", "@", and " " (spaces). I can not say the same for the other programs. The odd thing is I don't really want to like Adobe Acrobat 9 Pro. I want to prefer a program developed by a smaller, leaner competitor.

Share this post


Link to post

My work flow includes the use of the following tools:

  • Fujitsu ScanSnap S1500M
  • Adobe Acrobat 9 Pro for Mac (included with scanner)
  • Evernote (free version)

1. Scan to high quality PDF

2. OCR in Acrobat Pro

3. Optimize in Acrobat Pro

4. Save in a single folder on local hard drive titled something like "OCR & Evernote"

5. Copy to Evernote and sync

File sizes are 1/10 or smaller of the original size and the Acrobat OCR is superior to ABBYY FineReader for Mac and PDF OCR X per my trials. Also, Acrobat 9 Pro for Mac works great on my Mac running Lion (10.7.3). I also like to see the archived PDF's in the original color schemes so I scan using "auto color detection".

Currently I use the free version of Evernote, though despite my initial reservation, Evernote functions great, so I'll likely upgrade to the "yearly" option. I initially hesitated to install Acrobat 9 Pro on my Mac due to the large amount of negative net chatter that I read. Instead I spent hours trying out various other OCR options. In the end, I installed Acrobat 9 Pro and found it works better than the others that I tried. It's a robust program and comes free with the ScanSnap S1500M.

One of the ways Acrobat 9 Pro is better than others that I've tried is its ability to OCR documents with multiple and inconsistent formatting. For example, some of my utility bills have headers with typical "to" and "from" info, usage charts that run the width of the document, narrow columns on one side of the document, additional tables of varying columns and rows and then paragraphs. This is all on one standard letter size document. Acrobat 9 Pro OCR'd 99% of the text correctly, including "$", "#", "@", and " " (spaces). I can not say the same for the other programs. The odd thing is I don't really want to like Adobe Acrobat 9 Pro. I want to prefer a program developed by a smaller, leaner competitor.

It should be noted that if you run the PDF optimizer on Acrobat it will automatically OCR for you at the same time that it optimizes. Furthermore, I've discovered that this combined process is a lot faster than having snapscan do the OCR and then seperately allowing Acrobat to optimize. The resultant file size is much smaller. I should mention that I do all this in a folder that I call "PDF holding" which I have elected as my default folder in all of my SnapScan profiles. After I have done all my tweaking, optimizing, page shuffling etc in this folder I save it as a "reduced size pdf" directly into my EN import folder. This will further reduce the size by about 10% or more and places it into the import folder which allows it to magically appear in EN. Therefore, it is my EN import folder which contains the final version of the pdf and not the "PDF holding" folder which is simply a transitional station. I generally delete most of the files that are there but I back up the files in the import folder.

Share this post


Link to post

My work flow includes the use of the following tools:

  • Fujitsu ScanSnap S1500M
  • Adobe Acrobat 9 Pro for Mac (included with scanner)
  • Evernote (free version)

1. Scan to high quality PDF

2. OCR in Acrobat Pro

3. Optimize in Acrobat Pro

4. Save in a single folder on local hard drive titled something like "OCR & Evernote"

5. Copy to Evernote and sync

File sizes are 1/10 or smaller of the original size and the Acrobat OCR is superior to ABBYY FineReader for Mac and PDF OCR X per my trials. Also, Acrobat 9 Pro for Mac works great on my Mac running Lion (10.7.3). I also like to see the archived PDF's in the original color schemes so I scan using "auto color detection".

Currently I use the free version of Evernote, though despite my initial reservation, Evernote functions great, so I'll likely upgrade to the "yearly" option. I initially hesitated to install Acrobat 9 Pro on my Mac due to the large amount of negative net chatter that I read. Instead I spent hours trying out various other OCR options. In the end, I installed Acrobat 9 Pro and found it works better than the others that I tried. It's a robust program and comes free with the ScanSnap S1500M.

One of the ways Acrobat 9 Pro is better than others that I've tried is its ability to OCR documents with multiple and inconsistent formatting. For example, some of my utility bills have headers with typical "to" and "from" info, usage charts that run the width of the document, narrow columns on one side of the document, additional tables of varying columns and rows and then paragraphs. This is all on one standard letter size document. Acrobat 9 Pro OCR'd 99% of the text correctly, including "$", "#", "@", and " " (spaces). I can not say the same for the other programs. The odd thing is I don't really want to like Adobe Acrobat 9 Pro. I want to prefer a program developed by a smaller, leaner competitor.

It should be noted that if you run the PDF optimizer on Acrobat it will automatically OCR for you at the same time that it optimizes. Furthermore, I've discovered that this combined process is a lot faster than having snapscan do the OCR and then seperately allowing Acrobat to optimize. The resultant file size is much smaller. I should mention that I do all this in a folder that I call "PDF holding" which I have elected as my default folder in all of my SnapScan profiles. After I have done all my tweaking, optimizing, page shuffling etc in this folder I save it as a "reduced size pdf" directly into my EN import folder. This will further reduce the size by about 10% or more and places it into the import folder which allows it to magically appear in EN. Therefore, it is my EN import folder which contains the final version of the pdf and not the "PDF holding" folder which is simply a transitional station. I generally delete most of the files that are there but I back up the files in the import folder.

You do something similar to me.

You're right about the OCR/optimization being run in the same process.

I first scan all my documents to a folder I call "ScanSnap Temp". I manually rename every file using an intuitive naming scheme that looks something like this:

date - tag - tag - tag...

The "tag" is really a key word or key words. The date is in this format "yyyy.mm.dd".

After I rename all the files, I then choose the option in Adobe Acrobat 9 Pro to OCR all documents in the folder "ScanSnap Temp". My OCR is automatically set to do the following:

1. OCR each PDF

2. Optimize each PDF, including reduce file size

3. Place completed file in a folder called "Optimized"

Right now I manually drag my files from "Optimized" into EN. I then move the files from the "Optimized" folder into a folder called "OCR & IN EVERNOTE". That's right, I currently keep a separate copy of the PDF in this folder on my hard drive. At some point, I plan to delete this folder and to keep only my data in EN. For now, this is a safety measure until I resolve all issues, etc.

I'm also a very young EN user, as I started about seven weeks ago. Though, I did go premium after my first month.

Share this post


Link to post

Using ScanSnap 1500M I have been incorporating bundled Adobe Acrobat 9 Pro into my workflow. I have been scanning documents - then choosing 'Optimize Scanned PDF'.

Things were travelling along happily until today when I discovered that my documents are being scrambled in the 'Optimize' process.

A post on another forum describes the problem - it is like pages in a fax have overlapped. Another example is any blanks at the bottom of pages are filled in by content from the top of the page

Unlike others on this forum it seems that 'Optimize Scanned PDF' is not doing OCR for me. Using the OCR text recognition tool cause the same issue.

Without close examination it is difficult to pick up the bizarre damage caused to the digital document.

I now have the difficult task of going back and determining when this damage to my digital archive commenced.

I have used Adobe's Acrobat Uninstaller tool and reinstalled the program.

I am running OS X 10.8.2 Mountain Lion.

Has anybody had similar issues? Any help would be appreciated?

Share this post


Link to post
On 08 October 2012 at 0:53 AM, baef47 said:

Things were travelling along happily until today when I discovered that my documents are being scrambled in the 'Optimize' process.

 

Have observed the same problem. Using a SnapScan direct into EN. Sometimes the PDF file text is at an angle, or has no OCR. So we run Optimize via Adobe Acrobat-Pro. Frequently, this resolves the problem Ok. But on other occasions the PDF become very degraded (scrambled) with this process. Don't know why, but would appreciate a solution.

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×
×
  • Create New...