(Archived) OCR totally broken?

salgud · December 5, 2010

For the first time in quite a while, I tried to search for text in EN 1.1 (OSX SL). Much to my chagrin, EN can no longer search for text in imported images at all! This is text in car ads copied directly from Craigslist into EN using the EN tool in Chrome and FF. I've tried a number of different words, all of which I knew occurred many times in these ads - EN can find none of them.

Others seeing this? When did this break? Or is this another feature removed for compatibility or something?

Major bummer for me - one of the main reasons I've kept EN for years.

BurgersNFries · December 8, 2010

(Our 5.5 million users are currently storing more than 200 million notes in Evernote!)

Holy smokes!

(Wonder how many are recipes for cumin waffles...)

jfwarrior · December 9, 2010

Our OCR software (best-of-breed commercial licensed) can occasionally blow up a 25MB file into over 2GB of intermediate files as it unpacks the compressed bitmaps, so we're working to smooth that out before making the situation worse.

Ahh... Ok. ... What if we import a 26MB PDF or a 45MB PDF and because of the limitation, it doesn't make it. However, Evernote might in the future, raise the limit of the 25MB to 50MB for Premium users.

What would happen then? Would the previously 'missed out' PDFs be OCRed, or would they be left out in the transition?

salgud · December 9, 2010

It wasn't about timing - most of the images I'm talking about were saved days or weeks ago.

As for an icon indicating they've been scanned to the right of the tag, no such icon is visible, unless it's an indistinguishable gray blob way over by the sync icon. In any case, the scan isn't working.

engberg · December 9, 2010

Our OCR software (best-of-breed commercial licensed) can occasionally blow up a 25MB file into over 2GB of intermediate files as it unpacks the compressed bitmaps, so we're working to smooth that out before making the situation worse.

jfwarrior · December 9, 2010

It depends on the nature of the change. We've seen PDFs that take over an hour on a fast server to OCR, even with our limits of 100 pages and 25MB for a PDF to OCR, so reprocessing every single PDF within Evernote isn't something we do for small changes. (Our 5.5 million users are currently storing more than 200 million notes in Evernote!)

Oh my. 25MB PDF limitation for OCR? ... That's really limiting. No difference for Premium users?

jefito · December 8, 2010

Sorry Dave. I was so well-behaved earlier in the day, but it's late and my keenly honed judgement is slipping a bit.

~Jeff

~ backing away slowly from the thread... ~

engberg · December 8, 2010

Ug. Please stop before I cry.

jefito · December 8, 2010

Wow, I'd better start typing faster!

Edit: [Note]

Just set up your temp directory as an Evernote import folder. You get to keep all logs, compiler temporaries, installer detritus etc., fill up your Evernote account, *and* it'll keep your temp directory nice and clean!! Don't forget to enable the Subfolders setting.

~Jeff

Pitamakan · December 8, 2010

Wow, I'd better start typing faster!

engberg · December 8, 2010

Accounts are currently limited to 100,000 notes, and there are a few accounts that are at that limit.

(Since that's more than 100 notes per day since Evernote's service was launched, it doesn't actually correspond to real users saving specific things that they want to remember, but rather people bulk forwarding garbage into Evernote in an automated way.)

Pitamakan · December 8, 2010

But that's only 36.36 notes per person. C'mon, people! Get clipping!!!
~Jeff

This is serious topic drift, but it makes me wonder how large the biggest Evernote account is -- how many notes it contains.

jefito · December 8, 2010

But that's only 36.36 notes per person. C'mon, people! Get clipping!!!

~Jeff

engberg · December 8, 2010

It depends on the nature of the change. We've seen PDFs that take over an hour on a fast server to OCR, even with our limits of 100 pages and 25MB for a PDF to OCR, so reprocessing every single PDF within Evernote isn't something we do for small changes. (Our 5.5 million users are currently storing more than 200 million notes in Evernote!)

jfwarrior · December 8, 2010

Ahh... Alright. :| If a new PDF scanning system was implemented, would older PDFs that were OCRed before the change be rescanned, or would this new change only apply to the new PDFs added in afterwards?

engberg · December 7, 2010

If you're performing a clean scan of a document, and you have a Premium account, then I'd recommend using PDF unless the document has a lot of handwriting that you need to recognize.

(We do plan to improve handwriting recognition within PDFs, but this requires a lot of work to replace/extend the "off the shelf", "best of breed" OCR software that we've licensed to handle PDFs.)

BurgersNFries · December 7, 2010

Is there any info on the difference in terms of OCR processing between images and PDFs? I'm planning to scan in a 50 page document into a PDF and it'd be terrible to organise all the files if they are imported separately as images.

viewtopic.php?f=30&t=15439&p=63146&hilit=handwriting+ocr#p63146

jfwarrior · December 7, 2010

The queue caught up on Saturday. Images will typically take a few minutes to be recognized on the server for Premium accounts, and a bit longer for Free accounts (maybe 10-15 minutes). You need to sync your client later to retrieve this OCR data and perform searches for the text that you see in your images.
PDF files, however, are only processed for Premium users.

Is there any info on the difference in terms of OCR processing between images and PDFs? I'm planning to scan in a 50 page document into a PDF and it'd be terrible to organise all the files if they are imported separately as images.

engberg · December 7, 2010

The queue caught up on Saturday. Images will typically take a few minutes to be recognized on the server for Premium accounts, and a bit longer for Free accounts (maybe 10-15 minutes). You need to sync your client later to retrieve this OCR data and perform searches for the text that you see in your images.

PDF files, however, are only processed for Premium users.

BurgersNFries · December 6, 2010

And free accounts will take longer than Premium accounts.

Pitamakan · December 5, 2010

Still working fine on the Mac, here ... though the OCR turn-around times were definitely a little slower for a couple of days.

Note that on the Mac client, there's an icon just to the right of the tag area that tells you whether your image has been OCR-ed or not.

BurgersNFries · December 5, 2010

viewtopic.php?f=30&t=20692&p=87017&hilit=unfortunately#p87017

I don't know what the current turn around times are.

jefito · December 5, 2010

This certainly works on the Windows client (at least it did on Friday), so if it's broken, then it's something to do with the Mac client. But OCR is done in the cloud, and so should be the same for everyone, so I am guessing the problem lies elsewhere. How long did you wait before you clipped the image and when you searched? There is a lag time for the OCR to take place on Evernote's servers (it's less for Premium users).

~Jeff

(Archived) OCR totally broken?

Idea

salgud 12

Link to comment

22 replies to this idea

Recommended Posts

BurgersNFries 2,407

Link to comment

jfwarrior 6

Link to comment

salgud 12

Link to comment

engberg 89

Link to comment

jfwarrior 6

Link to comment

jefito 5,589

Link to comment

engberg 89

Link to comment

jefito 5,589

Link to comment

Pitamakan 2

Link to comment

engberg 89

Link to comment

Pitamakan 2

Link to comment

jefito 5,589

Link to comment

engberg 89

Link to comment

jfwarrior 6

Link to comment

engberg 89

Link to comment

BurgersNFries 2,407

Link to comment

jfwarrior 6

Link to comment

engberg 89

Link to comment

BurgersNFries 2,407

Link to comment

Pitamakan 2

Link to comment

BurgersNFries 2,407

Link to comment

jefito 5,589

Link to comment

Archived