Jump to content
Sign in to follow this  
salgud

mac (Archived) OCR totally broken?

Recommended Posts

For the first time in quite a while, I tried to search for text in EN 1.1 (OSX SL). Much to my chagrin, EN can no longer search for text in imported images at all! This is text in car ads copied directly from Craigslist into EN using the EN tool in Chrome and FF. I've tried a number of different words, all of which I knew occurred many times in these ads - EN can find none of them.

Others seeing this? When did this break? Or is this another feature removed for compatibility or something?

Major bummer for me - one of the main reasons I've kept EN for years.

Share this post


Link to post

This certainly works on the Windows client (at least it did on Friday), so if it's broken, then it's something to do with the Mac client. But OCR is done in the cloud, and so should be the same for everyone, so I am guessing the problem lies elsewhere. How long did you wait before you clipped the image and when you searched? There is a lag time for the OCR to take place on Evernote's servers (it's less for Premium users).

~Jeff

Share this post


Link to post

Still working fine on the Mac, here ... though the OCR turn-around times were definitely a little slower for a couple of days.

Note that on the Mac client, there's an icon just to the right of the tag area that tells you whether your image has been OCR-ed or not.

Share this post


Link to post

The queue caught up on Saturday. Images will typically take a few minutes to be recognized on the server for Premium accounts, and a bit longer for Free accounts (maybe 10-15 minutes). You need to sync your client later to retrieve this OCR data and perform searches for the text that you see in your images.

PDF files, however, are only processed for Premium users.

Share this post


Link to post
The queue caught up on Saturday. Images will typically take a few minutes to be recognized on the server for Premium accounts, and a bit longer for Free accounts (maybe 10-15 minutes). You need to sync your client later to retrieve this OCR data and perform searches for the text that you see in your images.

PDF files, however, are only processed for Premium users.

Is there any info on the difference in terms of OCR processing between images and PDFs? I'm planning to scan in a 50 page document into a PDF and it'd be terrible to organise all the files if they are imported separately as images.

Share this post


Link to post

If you're performing a clean scan of a document, and you have a Premium account, then I'd recommend using PDF unless the document has a lot of handwriting that you need to recognize.

(We do plan to improve handwriting recognition within PDFs, but this requires a lot of work to replace/extend the "off the shelf", "best of breed" OCR software that we've licensed to handle PDFs.)

Share this post


Link to post

Ahh... Alright. :| If a new PDF scanning system was implemented, would older PDFs that were OCRed before the change be rescanned, or would this new change only apply to the new PDFs added in afterwards?

Share this post


Link to post

It depends on the nature of the change. We've seen PDFs that take over an hour on a fast server to OCR, even with our limits of 100 pages and 25MB for a PDF to OCR, so reprocessing every single PDF within Evernote isn't something we do for small changes. (Our 5.5 million users are currently storing more than 200 million notes in Evernote!)

Share this post


Link to post
(Our 5.5 million users are currently storing more than 200 million notes in Evernote!)

Holy smokes!

(Wonder how many are recipes for cumin waffles...)

Share this post


Link to post

But that's only 36.36 notes per person. C'mon, people! Get clipping!!!

~Jeff

Share this post


Link to post
But that's only 36.36 notes per person. C'mon, people! Get clipping!!!

~Jeff

This is serious topic drift, but it makes me wonder how large the biggest Evernote account is -- how many notes it contains.

Share this post


Link to post

Accounts are currently limited to 100,000 notes, and there are a few accounts that are at that limit.

(Since that's more than 100 notes per day since Evernote's service was launched, it doesn't actually correspond to real users saving specific things that they want to remember, but rather people bulk forwarding garbage into Evernote in an automated way.)

Share this post


Link to post
Wow, I'd better start typing faster!

Edit: [Note]

Just set up your temp directory as an Evernote import folder. You get to keep all logs, compiler temporaries, installer detritus etc., fill up your Evernote account, *and* it'll keep your temp directory nice and clean!! Don't forget to enable the Subfolders setting. :)

~Jeff

Share this post


Link to post

Sorry Dave. I was so well-behaved earlier in the day, but it's late and my keenly honed judgement is slipping a bit.

~Jeff

~ backing away slowly from the thread... ~

Share this post


Link to post
It depends on the nature of the change. We've seen PDFs that take over an hour on a fast server to OCR, even with our limits of 100 pages and 25MB for a PDF to OCR, so reprocessing every single PDF within Evernote isn't something we do for small changes. (Our 5.5 million users are currently storing more than 200 million notes in Evernote!)

Oh my. 25MB PDF limitation for OCR? :) ... That's really limiting. No difference for Premium users?

Share this post


Link to post

Our OCR software (best-of-breed commercial licensed) can occasionally blow up a 25MB file into over 2GB of intermediate files as it unpacks the compressed bitmaps, so we're working to smooth that out before making the situation worse.

Share this post


Link to post

It wasn't about timing - most of the images I'm talking about were saved days or weeks ago.

As for an icon indicating they've been scanned to the right of the tag, no such icon is visible, unless it's an indistinguishable gray blob way over by the sync icon. In any case, the scan isn't working.

Share this post


Link to post
Our OCR software (best-of-breed commercial licensed) can occasionally blow up a 25MB file into over 2GB of intermediate files as it unpacks the compressed bitmaps, so we're working to smooth that out before making the situation worse.

Ahh... Ok. ... What if we import a 26MB PDF or a 45MB PDF and because of the limitation, it doesn't make it. However, Evernote might in the future, raise the limit of the 25MB to 50MB for Premium users.

What would happen then? Would the previously 'missed out' PDFs be OCRed, or would they be left out in the transition?

Share this post


Link to post
Guest
This topic is now closed to further replies.
Sign in to follow this  

×
×
  • Create New...