Jump to content

Difference in OCR on paid versus free


Recommended Posts

Hello, I am using the free evernote on Mac OS X 10.10.2 Yosemite and the web clipper when it decides to work and of course the web app and desktop app.

 

I notice it does a pretty good job of OCR on text in images, pdf files, etc.  What is the gain you get from going premium in regards to OCR?

Link to comment
  • Level 5*

Hi.  AFAIK Evernote doesn't offer a better image OCR to premium users - although they do start at the front of any queues.  With PDFs I'm confused - I thought that a searchable PDF would be indexed for a free user,  but that an image PDF would not.  The advantage of premium was then that image PDFs,  plus Office files like DOCX and XLSX would also be indexed.  Again front-of-any-queue service for prem users.

Link to comment
I think a big consideration, in connection with Evernote's brilliant OCR capabilities, is that when you figure out some new use cases for OCRed images and PDFs, there is the tendency to proliferate your stuffing certain images and documents into Evernote... which will decimate your 60 MB monthly upload limit in a matter of hours, depending on your workflow on any given day. 

 

In other words, if you're planning on really giving the OCR thing a run for its money, you're likely going to want/ need premium upload limits (4 GB/ month). 

 

Golden Age public domain comic book scans are only the tip of the iceberg for me. One comic book (about 68 .jpg images) might set you back between 60 - 150 MB. Just the original set of Daredevil Comics publications, comprising 130+ editions, will eat up no less than 8 GB. 

 

It's phenomenal how accurate a tool Evernote's OCR search is when searching my database of 23,000+ comic strips and now comic books. It's quite something to know that one possibly has the only searchable database that can search the dialogue of specific comic books/ comic strips. I'll be making some of the public domain material databases I'm working on available by way of published notebooks soon. One would not join such notebooks... simply keep the link to search online. 

 

Anyways... depending on your use case for OCR, you may need *lots* of (premium) upload limit ;)

 

So, as @Gazumped says... Evernote doesn't offer a better quality service. It's more related to speed, and in many cases, upload limit.

Link to comment
  • Level 5*

...and you can always buy more upload limit if you need it in a single month.  Matter of interest @Frank - how do you digitise something like a set of Daredevil Comics - into PDF?  Another electronic document format?

Link to comment

I'm looking for a workflow for handwriting in a PDF to become indexed. So far, as a free user, I think my only option is to turn my PDF into images and upload all pages that way. Does the premium account mean I can upload PDFs of my handwriting and have it indexed directly? I've read lots of conflicting information about this online.

Link to comment
  • Level 5*

Hi.  Handwriting Recognition is as much art as science.  If you write in black ink on a light background without lines or patterns,  the software will make the best guess it can at the word your (no offense) drunken spider crawl was meant to convey.

 
"House" however maybe could be Horse or Homes,  so the HR software will note all three of them as being possible choices.  The same applies to the rest of  your document.  Unlike OCR however this translation is not available to download and you can't copy and paste text from it.  
 
Searches for 'horse' may generate a note with 'house' highlighted by mistake.
 
To get the best results - obviously - write as clearly as you can;  black ink / light background and avoid strong lines or patterns.  Photograph or scan the note in good light and on a contrasting background with Evernote's document camera.
 
Evernote will do the best it can with whatever you can give it - and in many cases it does a scarily good job.  But if it occasionally goes sadly wrong,  there's not much you can do.  There's no way to correct it,  other than by typing the correct keywords into the title or an opening comment.
 
Tip:  I have scanned page after page of my old spiral-bound scribble-covered small notebooks into Evernote - I'll put an executive summary at the start of the note,  and I may split the pages at various points so I can add more keywords or summaries.  They are all searchable,  and provided I know that this note includes the lecture on "the structure of the eye" I can usually find the section I need far faster by reading and paging through my old notes than I could by searching for terms which might or might not be in there.
Link to comment

Thanks for your reply, Gazumped. I'm not sure my question was clear. Basically, I can get a PDF of my handwriting out of another app (Notability) and I tried uploading it to Evernote. It was not indexed and searches within the note just said the note had no content.

Changing the PDF to an image on my Mac and uploading that to Evernote, the handwriting was analysed and I was able to search against it.

I could setup a workflow to turn PDFs into images so they will be indexed, but that involves my desktop machine.

I'm just wondering if a premium account would then allow the original PDF to be analysed or is it just that Evernote never index handwriting in a PDF?

Link to comment
  • Level 5*

AFAIK Evernote won't index handwriting in a PDF.  Handwriting Rec and OCR aren't (I gather) the same thing,  and PDFs get handed off to a process that -subject to a few rules about size and pages- will (only) OCR the content.  I have Adobe Acrobat which I think makes a fair fist of recognising handwritten content,  but as I noted earlier I tend to annotate my handwriting and rely on Mk1 eyeballs to find context based on typewritten cues.  Local OCR/ HR might help you out,  but that's also then a desktop process.  I think if you're storing PDFs,  you'll have to find a way to convert them somehow.

Link to comment

...and you can always buy more upload limit if you need it in a single month. Matter of interest @Frank - how do you digitise something like a set of Daredevil Comics - into PDF? Another electronic document format?

 

First off, the scanned pages of comic books can be found on a number of sites. I've been pulling stuff off the "Digital Comic Museum" (strictly public domain material). You download either a .rar, .cbr or .cbz file. The 2 latter files can be renamed to .rar files and extracted. Basically you'll then have about 60-68 .jpg images. I then bulk edit the .jpg file names and dump them into an Evernote import folder on desktop, which will create a note for each image file. You can then flip through them one by one in the note list or in presentation mode.

So one doesn't need to convert to PDF. Just the plain images will do :-)

I will share the particular notebook I am working on within about 5 weeks or so. Maybe a lot sooner. Daredevil vs. Hitler (The very first publication) is quite an interesting one - an interesting mix of history and fiction... and "The Claw", haha!

Link to comment
  • Level 5*

Thanks!  and darn.  Just joined the DCM..  suddenly I feel any spare time this year just evaporated....  I'll look forward to you sharing that notebook!!

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...