Jump to content
  • 0

(Archived) OCR totally broken?


salgud

Idea

For the first time in quite a while, I tried to search for text in EN 1.1 (OSX SL). Much to my chagrin, EN can no longer search for text in imported images at all! This is text in car ads copied directly from Craigslist into EN using the EN tool in Chrome and FF. I've tried a number of different words, all of which I knew occurred many times in these ads - EN can find none of them.

Others seeing this? When did this break? Or is this another feature removed for compatibility or something?

Major bummer for me - one of the main reasons I've kept EN for years.

Link to comment

22 replies to this idea

Recommended Posts

Our OCR software (best-of-breed commercial licensed) can occasionally blow up a 25MB file into over 2GB of intermediate files as it unpacks the compressed bitmaps, so we're working to smooth that out before making the situation worse.

Ahh... Ok. ... What if we import a 26MB PDF or a 45MB PDF and because of the limitation, it doesn't make it. However, Evernote might in the future, raise the limit of the 25MB to 50MB for Premium users.

What would happen then? Would the previously 'missed out' PDFs be OCRed, or would they be left out in the transition?

Link to comment

It wasn't about timing - most of the images I'm talking about were saved days or weeks ago.

As for an icon indicating they've been scanned to the right of the tag, no such icon is visible, unless it's an indistinguishable gray blob way over by the sync icon. In any case, the scan isn't working.

Link to comment

Our OCR software (best-of-breed commercial licensed) can occasionally blow up a 25MB file into over 2GB of intermediate files as it unpacks the compressed bitmaps, so we're working to smooth that out before making the situation worse.

Link to comment
It depends on the nature of the change. We've seen PDFs that take over an hour on a fast server to OCR, even with our limits of 100 pages and 25MB for a PDF to OCR, so reprocessing every single PDF within Evernote isn't something we do for small changes. (Our 5.5 million users are currently storing more than 200 million notes in Evernote!)

Oh my. 25MB PDF limitation for OCR? :) ... That's really limiting. No difference for Premium users?

Link to comment
  • Level 5*
Wow, I'd better start typing faster!

Edit: [Note]

Just set up your temp directory as an Evernote import folder. You get to keep all logs, compiler temporaries, installer detritus etc., fill up your Evernote account, *and* it'll keep your temp directory nice and clean!! Don't forget to enable the Subfolders setting. :)

~Jeff

Link to comment

Accounts are currently limited to 100,000 notes, and there are a few accounts that are at that limit.

(Since that's more than 100 notes per day since Evernote's service was launched, it doesn't actually correspond to real users saving specific things that they want to remember, but rather people bulk forwarding garbage into Evernote in an automated way.)

Link to comment

It depends on the nature of the change. We've seen PDFs that take over an hour on a fast server to OCR, even with our limits of 100 pages and 25MB for a PDF to OCR, so reprocessing every single PDF within Evernote isn't something we do for small changes. (Our 5.5 million users are currently storing more than 200 million notes in Evernote!)

Link to comment

If you're performing a clean scan of a document, and you have a Premium account, then I'd recommend using PDF unless the document has a lot of handwriting that you need to recognize.

(We do plan to improve handwriting recognition within PDFs, but this requires a lot of work to replace/extend the "off the shelf", "best of breed" OCR software that we've licensed to handle PDFs.)

Link to comment
The queue caught up on Saturday. Images will typically take a few minutes to be recognized on the server for Premium accounts, and a bit longer for Free accounts (maybe 10-15 minutes). You need to sync your client later to retrieve this OCR data and perform searches for the text that you see in your images.

PDF files, however, are only processed for Premium users.

Is there any info on the difference in terms of OCR processing between images and PDFs? I'm planning to scan in a 50 page document into a PDF and it'd be terrible to organise all the files if they are imported separately as images.

Link to comment

The queue caught up on Saturday. Images will typically take a few minutes to be recognized on the server for Premium accounts, and a bit longer for Free accounts (maybe 10-15 minutes). You need to sync your client later to retrieve this OCR data and perform searches for the text that you see in your images.

PDF files, however, are only processed for Premium users.

Link to comment

Still working fine on the Mac, here ... though the OCR turn-around times were definitely a little slower for a couple of days.

Note that on the Mac client, there's an icon just to the right of the tag area that tells you whether your image has been OCR-ed or not.

Link to comment
  • Level 5*

This certainly works on the Windows client (at least it did on Friday), so if it's broken, then it's something to do with the Mac client. But OCR is done in the cloud, and so should be the same for everyone, so I am guessing the problem lies elsewhere. How long did you wait before you clipped the image and when you searched? There is a lag time for the OCR to take place on Evernote's servers (it's less for Premium users).

~Jeff

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...