Jump to content


Welcome! You're currently a Guest.

If you'd like to join in the Discussion, or access additional features in our forums, please sign in with your Evernote Account here. Have an Evernote Account but forgot your password? Reset it! Don't have an account yet? Create One! You'll need to set your Display Name before your first post.

My scanned PDF's don't get OCR'd by EN

OCR Scan

  • Please log in to reply
9 replies to this topic

#1 ChrisToad

ChrisToad

  • Pip
  • Title: Member
  • Group: Members
  • 4 posts

Posted 08 February 2012 - 11:57 PM

Team!

I'm new to EN. I also did a ton of digging before I decided to create a thread. If this has been covered, apologies.


I have a small conundrum. In short, my scanned documents (PDF) do not get OCR processing by EN.. and I don't understand why. I'd really like for my scanned docs to be search-able.

I'm using a Xerox WC7655. I double checked this knowledge base entry to help make sure my documents meet the OCR processing criterion, they do.
https://support.ever...#38;docID=12656


My doc is a full page meeting agenda with text, and some sloppy handwriting courtesy of yours truly. I can't attach it here because of sensitive business data on the page.

Here's my test to help troubleshoot the issue:
  • Photograph document using iPhone, add to EN.
    • Result: EN performs OCR on text and sloppy handwriting. High-five, go team.
  • Scan same document with Xerox 7655, add to EN.
    • Result: Flat 'image' PDF in Adobe Reader and in EN. Not search-able, no EN OCR processing occurs, at all. Unsuccessful result.
  • Scan same document with Xerox 7655, but this time enabling OCR on the Xerox.
    • Result: Xerox OCR algorithm converts type, but not the handwriting. So it 'works', but not as well as the EN OCR processing. The EN support doc says that if OCR pre-exists, they don't process it.
    So I really want process #2 to work correctly and have EN perform the OCR work... any suggestion on what I might be doing wrong here?

Just to add, I'm not a Premium user (yet), I want to see that this works first. I have allowed a few hours to pass in order to work through the OCR queue.

Edited by ChrisToad, 09 February 2012 - 12:37 AM.


#2 gazumped

gazumped

  • PipPipPipPipPip
  • Title: Operative
  • Group: Members
  • 790 posts

Posted 09 February 2012 - 12:52 AM

Hi Chris

If you're not a prem user, your OCR will have to wait until we priviledged paid-for types get service. I suspect that may be your root problem. However. PDF OCR does not include hand-scrawled text, so if you're looking for a solution that covers the handwriting, JPGs may be the way to go. You mention that if you OCR this stuff yourself you get a result (subject to the minor omission mentioned above) - so what's wrong with continuing to OCR the stuff yourself so you don't have to wait in future?

- and bear in mind that you can submit two or more items per note.. so you could add an OCR'd PDF file for the typed text, and pics - or parts of pics - for the related handwritten comments.
EN user - premium | Desktop client 4.5.6.6884 (249072) public | HP DV6000 laptop Vista SP1 via FF12+clipper 5.0.0.236819 | Galaxy S2 Android 4.0.3 | Mobile client 250139 v4.0 Beta 4(prerelease) on 23415 Vodafone UK

#3 JamesCE

JamesCE

  • PipPip
  • Title: Alliance Lackey
  • Group: Members
  • 58 posts

Posted 09 February 2012 - 01:01 AM

I was going to say the same... you may just have to wait.
Many of us users (at least me) always conduct OCR prior to EN.
Lots of discussions about that but personally I don't reply on EN. Not that its not that good but if I ever export it... it will always be there ;)

#4 ChrisToad

ChrisToad

  • Pip
  • Title: Member
  • Group: Members
  • 4 posts

Posted 09 February 2012 - 01:07 AM

View Postgazumped, on 09 February 2012 - 12:52 AM, said:

Hi Chris

If you're not a prem user, your OCR will have to wait until we priviledged paid-for types get service. I suspect that may be your root problem. However. PDF OCR does not include hand-scrawled text, so if you're looking for a solution that covers the handwriting, JPGs may be the way to go. You mention that if you OCR this stuff yourself you get a result (subject to the minor omission mentioned above) - so what's wrong with continuing to OCR the stuff yourself so you don't have to wait in future?

- and bear in mind that you can submit two or more items per note.. so you could add an OCR'd PDF file for the typed text, and pics - or parts of pics - for the related handwritten comments.

Thanks for the reply!

I thought maybe the OCR queue might be the cause. What confused me was that the JPG photo taken with my iPhone completed the EN OCR process in a matter of minutes... while the scanned PDF has been sync'd to the cloud for about 2 days now. It could be my error by assuming that the OCR queue was agnostic to file types, but the JPG queue may be separate from the PDF queue (longer wait?). Not sure.

Regarding the 'handwriting'... I exaggerated slightly. My handwriting isn't completely mangled, just 'guy' handwriting. ;) I was shocked and impressed at how the EN OCR was able to detect and make search-able everything i had written on the page. A majority of the notes I'd like to scan are hand-written, that is why I got excited when I saw EN working so well with it. I agree that the Xerox OCR will work for typed docs.
My ideal solution would be to bulk scan the masses of stuff on my desk, and let EN allow me to search through everything. Typed, and within reason, handwritten. 100% detection of handwritten notes isn't realistic, and I realize that.

Thank you again for the reply!

#5 gazumped

gazumped

  • PipPipPipPipPip
  • Title: Operative
  • Group: Members
  • 790 posts

Posted 09 February 2012 - 01:40 AM

No problem - if it helps I have around 7,000 notes with mixed JPG / DOC / PDF / Webclip and other content, some of the PDF files being well over the page limit for Evernote OCR. I OCR everything I can before uploading, with suitably titled and tagged notes. All of this is now electronic, but it used to be a six-foot high by around 10-foot long set of shelving groaning with folders of various types. Getting stuff filed into that scenario was daunting, and finding anything was.. unreliable. I now have this external hard drive sitting on my desk and do all my filing and finding more reliably and sitting down. And a wheelbarrow now occupies the previously allocated document storage area. I'd say you can pretty much rely on Evernote to tidy up your working area - all you have to do is get started!

- and I don't even work for these guys ;)
EN user - premium | Desktop client 4.5.6.6884 (249072) public | HP DV6000 laptop Vista SP1 via FF12+clipper 5.0.0.236819 | Galaxy S2 Android 4.0.3 | Mobile client 250139 v4.0 Beta 4(prerelease) on 23415 Vodafone UK

#6 ChrisToad

ChrisToad

  • Pip
  • Title: Member
  • Group: Members
  • 4 posts

Posted 09 February 2012 - 05:48 PM

View Postgazumped, on 09 February 2012 - 01:40 AM, said:

- and I don't even work for these guys ;)

Somewhat shocking! Thanks for the replies!

#7 GrumpyMonkey

GrumpyMonkey

  • Title: 不機嫌な猿
  • Group: Evernote Evangelist
  • 2,353 posts

Posted 09 February 2012 - 05:52 PM

View Postgazumped, on 09 February 2012 - 01:40 AM, said:

No problem - if it helps I have around 7,000 notes with mixed JPG / DOC / PDF / Webclip and other content, some of the PDF files being well over the page limit for Evernote OCR. I OCR everything I can before uploading, with suitably titled and tagged notes. All of this is now electronic, but it used to be a six-foot high by around 10-foot long set of shelving groaning with folders of various types. Getting stuff filed into that scenario was daunting, and finding anything was.. unreliable. I now have this external hard drive sitting on my desk and do all my filing and finding more reliably and sitting down. And a wheelbarrow now occupies the previously allocated document storage area. I'd say you can pretty much rely on Evernote to tidy up your working area - all you have to do is get started!

- and I don't even work for these guys ;)

similar experience here. primarily pdf and text. it took a really long time, but i have digitized several bookcases worth of books and notes. my file cabinets and file boxes have all been digitized as well. going paperless was definitely not easy, but it was well worth it. i also ocr before uploading.

EVERNOTE: Getting Started | Support Page | Knowledge Base | Support Requests

For tips about using Evernote, see my shared notebook at https://www.evernote...istopher/public


#8 Stephen Towler

Stephen Towler

  • Pip
  • Title: Member
  • Group: Members
  • 1 posts

Posted 18 February 2012 - 03:43 AM

Ditto. I converted my entire library (> 500 books) to OCR'd PDFs. A local copy shop uses a hydraulic blade to chop off each spine ($1 each), and then I feed through my scansnap S1500M for OCR and storage. Any texts I'm actively working with get attached to an Evernote note (or, if > 20 MB, dropbox) for easy cross-platform access. I scan at 300 dpi and store at 150 dpi after text-under-image OCR. Searching and marking-up resulting PDFs is plenty fast on iPad2 and mac, linux, and windows desktops.

#9 GrumpyMonkey

GrumpyMonkey

  • Title: 不機嫌な猿
  • Group: Evernote Evangelist
  • 2,353 posts

Posted 18 February 2012 - 12:44 PM

View PostStephen Towler, on 18 February 2012 - 03:43 AM, said:

Ditto. I converted my entire library (> 500 books) to OCR'd PDFs. A local copy shop uses a hydraulic blade to chop off each spine ($1 each), and then I feed through my scansnap S1500M for OCR and storage. Any texts I'm actively working with get attached to an Evernote note (or, if > 20 MB, dropbox) for easy cross-platform access. I scan at 300 dpi and store at 150 dpi after text-under-image OCR. Searching and marking-up resulting PDFs is plenty fast on iPad2 and mac, linux, and windows desktops.

that's great!

i have slowly been converting my library to pdfs, and i have done a few thousand books and articles now. it is actually pretty easy to tear apart a book yourself (manageable sections of a few dozen pages at a time), trim it (if you have a paper guillotine, then the process goes a bit more smoothly), feed it through the scan snap (ideally, an office-quality scanner with 600 dpi is the way to go), and ocr it. the ipad 2 is ok, but i wouldn't call it fast. i have high expectations for the ipad 3!

EVERNOTE: Getting Started | Support Page | Knowledge Base | Support Requests

For tips about using Evernote, see my shared notebook at https://www.evernote...istopher/public


#10 BurgersNFries

BurgersNFries

  • Title: Don't make me come over there...
  • Group: Evernote Evangelist
  • 7,279 posts

Posted 18 February 2012 - 02:09 PM

Indexing of PDFs is a premium feature.

If you want handwriting OCRd, it's best to scan as an image, since the image OCR produces a tree of possibilities. The PDF OCR doesn't work well with handwriting.
I'm not affiliated with Evernote. However, my Evernote sings & is an integral part of my life.

Submit support requests toward the bottom of the help/support page here. If you do not receive an auto reply email with a case #, it did NOT get submitted. Premium users will receive a reply within one business day, California time. Free users receive a reply as time permits.





Also tagged with OCR, Scan

3 user(s) are reading this topic

0 members, 3 guests, 0 anonymous users

Clip to Evernote