Jump to content
Sign in to follow this  
jfwarrior

(Archived) OCRed PDF Doesn't Search Outside of Evernote?

Recommended Posts

Something interesting that I just found out just now is that OCRed PDFs do not search outside Evernote. If you right click the PDF and click 'Open' and open it with Preview or another PDF reader, you cannot search the OCRed text.

Share this post


Link to post

I believe the reason is because the Evernote OCR process creates a 2nd file with the OCR'd info, but no graphics and limited formatting. The 2nd version is kept with the Evernote server and not in your local folder.

if Evernote did the OCR, there should be two options available when you right click on the PDF.

  • 1.) Save as
    2.) Save Searchable PDF

Go to this link and scroll down to see the two versions I uploaded.

http://forum.evernote.com/phpbb/viewtopic.php?f=30&t=21472&start=0

This is one of the reasons I have my ScanSanp set up to do the OCR before sending to Evernote.

Share this post


Link to post

This means, if I want to leave Evernote for some reason and then makes an export of all the pdf-files p.e. into the file system, I have no seachable pdf-files, or? And then?

Share this post


Link to post
This means, if I want to leave Evernote for some reason and then makes an export of all the pdf-files p.e. into the file system, I have no seachable pdf-files, or? And then?

Yes. And then? You can do whatever you want to do.

Evernote does not fiddle with your original files. That is why there is a second file.

If you leave Evernote, you get back exactly what you put in.

Share this post


Link to post
Go to this link and scroll down to see the two versions I uploaded.

http://forum.evernote.com/phpbb/viewtopic.php?f=30&t=21472&start=0

This is one of the reasons I have my ScanSanp set up to do the OCR before sending to Evernote.

Yep - I read that before. And that's the problem - You cannot export it out. Some users on the forum claim that Evernote is quite open, with the exporting of things out using .enex or WebArchive/HTML, but yet you cannot replicate the same functionality offered in Evernote if you export it out. (I remember reading somewhere that people could use OCR and index the file and export it out for other uses, but as demonstrated here, this does not work, as the output is split into two options.)

And not everyone has a ScanSnap at their disposal. Evernote is targeted for 'everyone' - Their main goal is to 'Remember Everything' for their users. And not all their users (Or even a large percentage of them) have a ScanSnap or OCR software available. The users have many reasons for using Evernote. One may be the OCR functionality offered.

This means, if I want to leave Evernote for some reason and then makes an export of all the pdf-files p.e. into the file system, I have no seachable pdf-files, or? And then?

According to Mr. Jbenson2, it means that the output is split into two files. The OCRed PDF version without any images or formatting, and the original file.

This means, if I want to leave Evernote for some reason and then makes an export of all the pdf-files p.e. into the file system, I have no seachable pdf-files, or? And then?

Yes. And then? You can do whatever you want to do.

Evernote does not fiddle with your original files. That is why there is a second file.

If you leave Evernote, you get back exactly what you put in.

So it's not that open after all...

Now, I'm an ardent supporter of Evernote and love their product and have persuaded 7-10 people to use Evernote already, but it's quite interesting that some people say that Evernote is quite an open piece of software.

Share this post


Link to post

Not sure what the blah-blah about 'openness' is all about -- Evernote has an open API, and their export format (.enex) is also open, and everything you put in, you are able to get back out again. Just seems like that's clouding the issue.

The motto 'Remember Everything' is upheld in this case. You put a PDF or an image in, you can get it back, exactly as it was when you put it in. That any OCR information they've added to what you put in for PDF files isn't available (though it is for images) doesn't really contravene 'Remember Everything'.

That being said, making the value-added OCR information available for export seems a not unreasonable request, unless there are licensing issues with the OCR software entailed; or maybe if it can't be expressed in the current .enex format.

~Jeff

Share this post


Link to post

Jeff, thanks for validating my comments.

You might be correct about the licensing agreement.

It is my understanding that both files are used by Evernote to create what appears to be a single OCR'd document.

If I am wrong, please let me know.

See the two images I uploaded on the forum thread mentioned below.

- the original non-OCR'd version (uploaded by user)

- the 2nd one is the OCR'd but not really usable (created by Evernote)

http://forum.evernote.com/phpbb/viewtopic.php?f=30&t=21472&start=0

Share this post


Link to post
Guest
This topic is now closed to further replies.
Sign in to follow this  

×
×
  • Create New...