Jump to content

(Archived) Finding Chris Nash


Recommended Posts

OK, I have a two scanned pdfs, both OCR'ed so as to have an embedded text layer. Both original documents had the text "CHRIS NASH" within them, but one has been OCR'ed to "CHRISNASH" and the other to "CHRIS NASH".

I was trying to locate all "Chris Nash" documents, and was using what I thought was a sensible search term: "nash". However, it became apparent that this only picks up "CHRIS NASH" but not "CHRISNASH". I would hope it would pick up any occurence of "nash" anywhere in the document.

On the other hand, searching for "chris" finds both "CHRISNASH" and "CHRIS NASH".

I can see that the search logic is to look for any word starting with the search term, but is there any way to search for the search term anywhere within any word in any document?

Thanks,

Mike

P.S. Is there an easy way to edit/correct the embedded OCR text in a pdf, for example to correct "CHRISNASH" to "CHRIS NASH"? I have Adobe Acrobat 9 Standard.

Link to comment

Evernote's search engine looks for words, or the beginnings of words if you're typing in the search box. It doesn't find arbitrary sequences of letters in the middle of words. This is why you match "chrisnash" if you look for "chris" but not "nash".

From either of our desktop clients, you can right-click on the PDF to save the alternate OCR version to your local computer. This is a full PDF document containing the interpreted text from your note. Once this is on your computer, you could edit it like any other PDF, but you'd need to drag it into a note in Evernote to make it a 'real' PDF instead of just a shadow OCR copy for search purposes. (We don't throw away your original PDF in case the original bytes are important to you.)

Link to comment
Evernote's search engine looks for words, or the beginnings of words if you're typing in the search box. It doesn't find arbitrary sequences of letters in the middle of words. This is why you match "chrisnash" if you look for "chris" but not "nash".

Yep, I figured that. But for me it would be useful to at least have the option of searching for "nash" anywhere in the words. Because with OCR I know that sometimes a word I'm looking for won't have been OCRed separately to an adjacent word. Any way of doing this through some advanced search option?

From either of our desktop clients, you can right-click on the PDF to save the alternate OCR version to your local computer. This is a full PDF document containing the interpreted text from your note. Once this is on your computer, you could edit it like any other PDF, but you'd need to drag it into a note in Evernote to make it a 'real' PDF instead of just a shadow OCR copy for search purposes. (We don't throw away your original PDF in case the original bytes are important to you.)

I'm not seeing this. When I right click on the pdf in evernote, I get a choice of Open, Save, and the Save Searchable is greyed out... this is a pdf OCRed by third party software, not by evernote.

There must be a way of editing the original pdf text layer to correct clear cases of incorrected OCR, keeping the text layer attached to the graphic in the pdf?

Mike

Link to comment

If you are finding words in your note, but "save searchable" is greyed, then that means your notes have OCR that you created on your own computer (i.e. it didn't come from the web service).

In this case, you can Open the existing PDF in your favorite PDF editor and then Save the file to incorporate any changes into Evernote.

Warning: after editing, the new PDF will need to be sent up to our servers when you sync, which will count against your monthly upload allowance.

This could add up if you're working with huge PDF documents.

Link to comment

Hi

Do you have any comment on the search question: for full flexibility I'd expect it be possible to have the option to search for a word which appears anywhere in the document, not necessarily at the start of word? That way, I can find chrisnash when searching on "nash".

As for editing the OCR text layer, I thought perhaps someone would know straight off. I have Adobe Acrobat 9 Standard, but I can't find a way of showing and editing the embedded text information. (edit: think I found it, buried in menu Tools > Advanced Editing > TouchUp Text Tool ... seems like you can then type new stuff in, though it's a shame you can't make it JUST show what is in the text layer so you could see exactly where the mistakes are... unless I've just missed another hidden option.)

Thanks,

Mike

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...