Jump to content

Welcome! You're currently a Guest.

If you'd like to join in the Discussion, or access additional features in our forums, please sign in with your Evernote Account here. Have an Evernote Account but forgot your password? Reset it! Don't have an account yet? Create One! You'll need to set your Display Name before your first post.

Photo
Paperless

Evernote PDF search highlighting problem makes paperless office unworkable

pdf searchable

  • Please log in to reply
24 replies to this topic

#1 pipkato

pipkato

  • Pip
  • Title: Member
  • Group: Members
  • 16 posts

Posted 24 June 2012 - 09:09 PM

I'm a big fan of Evernote in general and I'm using the Premium version of Evernote for some time now with a Fujitsu ScanSnap 1500. I had great dreams of using the Evernote 'searchable PDF' option to scan in all my paper and find stuff using text search when needed.

But, despite numerous promises, Evernote have failed to deliver a PDF search that reliably highlights the searched text in the search results.

Usually I end up with one result (the first, as a rule) with correctly highlighted text, and the rest of the list either have no highlighted results, or can have all kinds of random highlight spots, often on totally blank areas.

Evernote have been aware of this problem for many months now, but it has not been addressed in any of their updates.

I'm very disappointed and frustrated by this 'Evernote Premium' behaviour.

#2 ~Adam

~Adam

  • PipPip
  • Title: Alliance Lackey
  • Group: Members
  • 58 posts

Posted 24 June 2012 - 09:12 PM

This post makes no sense to me. I am "Premium" as well and have no problems with the search, both with typewritten text and handwritten text. Also, I've never seen it highlight a "blank area" before.

#3 pipkato

pipkato

  • Pip
  • Title: Member
  • Group: Members
  • 16 posts

Posted 24 June 2012 - 09:23 PM

Thanks Adam.

From Evernote:

Ticket # 16051-80087


Dear Roy,

The PDF highlighting not aligning is currently a known issue that is being investigated. We appreciate you reporting this to us.

From user: May 10th
icoco has just posted a reply to a topic that you have subscribed to titled "BUG: Evernote not highlighting text in PDF search results".

----------------------------------------------------------------------
I have the same request. I'd certainly expect to have text or passages highlighted when performing a search in any document. This should apply to PDF documents, too. Evernote seems to find the PDF document that contains the search term, but doesn't mark the text or passage. Which mostly is quite cumbersome, especially regarding long documents like manuals etc.

Please, Evernote folks, do something about that issue.

Regards.

#4 pipkato

pipkato

  • Pip
  • Title: Member
  • Group: Members
  • 16 posts

Posted 24 June 2012 - 09:24 PM

Adam,

I should have made clear that this is only an issue with documents scanned to PDF in Evernote Premium. Searching on all other documents is fine.

#5 Mike Wood

Mike Wood

  • PipPipPipPipPip
  • Title: Browncoat
  • Group: Members
  • 680 posts

Posted 27 June 2012 - 05:29 PM

You can't search within the note either once you open it... not a huge problem on a single page PDF but useless on a 30 page one.

#6 Bill Randle

Bill Randle

  • Pip
  • Title: Member
  • Group: Members
  • 2 posts

Posted 04 September 2012 - 03:24 AM

I've had to use control F to find the search string in a long PDF document once the correct notes have been located. Is there an easier way?

#7 Mike Wood

Mike Wood

  • PipPipPipPipPip
  • Title: Browncoat
  • Group: Members
  • 680 posts

Posted 06 September 2012 - 06:12 AM

@bill I know a bit more about this issue now and it helps a lot if you create a searchable PDF before uploading to EN. PDFs created like this produce highlights in yellow as you would expect. If its a single page PDF then you don't need to use Control-F but on a multipage there is no choice; often there is more than one occurrence on different pages, so the way this works is understandable.

#8 AndyC

AndyC

  • Pip
  • Title: Member
  • Group: Members
  • 15 posts

Posted 06 September 2012 - 07:49 PM

I've relied on using ScanSnap OCR to create the documents, and out of 5,000 pages I've found every single thing I've searched for so far - and I've searched a lot, just to prove to myself it works.

#9 pipkato

pipkato

  • Pip
  • Title: Member
  • Group: Members
  • 16 posts

Posted 07 September 2012 - 07:13 PM

Hello Andy.

Thanks for mentioning ScanSnap's OCR. It works for me too.

But I came to this whole 'paperless office' adventure through Brooks Duncan's excellent DocumenSnap website, and, if I remember correctly (it's many months ago now) his advice was to use Evernote's OCR to avoid having to wait for each ScanSnap document scan to be OCRed. Using Evernote's PDF OCR should theoretically allow continuous scanning, while Evernote applies its OCR in the background. And, it does work in a kind of a way. But as my original post pointed out it is completely unreliable. And despite Evernote being aware of the issue, and having issued many updates since I raised it with them, they can't seem to solve the problem.

With Evernote so excellent in many other areas, I'm puzzled by this long-standing issue. Non-PDF clippings search works perfectly for me. But, since I got my ScanSnap setup specifically for going paperless with Evernote Premium, I'm reluctant to end up going back to the slower (but effective) ScanSnap OCR.

I hope repeated complaints from users like us can eventually persuade Evernote to resolve the issue, or, at least, explain why they can't.

#10 bduncan

bduncan

  • PipPip
  • Title: Alliance Lackey
  • Group: Members
  • 60 posts

Posted 07 September 2012 - 07:17 PM

Hi pipkato, thanks for your kind words re: DocumentSnap! Are you on Mac or Windows? I might be able to come up with a workflow where the ScanSnap OCR runs in the background, and then puts it in Evernote. Then you'd get a best of both worlds sort of deal.
Brooks Duncan, Paperless Geek
http://www.documentsnap.com

#11 pipkato

pipkato

  • Pip
  • Title: Member
  • Group: Members
  • 16 posts

Posted 07 September 2012 - 07:30 PM

That sound great to me Brooks.

I'm on the Mac, OS X 10.8 and 10.7 on another machine.

Thanks for the offer, and I hope you can manage it .. it would be great, at least until Evernote solve the PDF OCR search problem.

And thanks again for DocumentSnap - it's a great paperless resource.

Hi pipkato, thanks for your kind words re: DocumentSnap! Are you on Mac or Windows? I might be able to come up with a workflow where the ScanSnap OCR runs in the background, and then puts it in Evernote. Then you'd get a best of both worlds sort of deal.



#12 gejkelly

gejkelly

  • Pip
  • Title: Member
  • Group: Members
  • 4 posts

Posted 10 December 2012 - 10:06 AM

Would just like to join the call for highlighting the search term in the PDF document. As a new user one of the reasons I went Premium was the ability to search PDFs that I have scanned over the years. I don't really want to re-scan. EN correctly finds the document/note but the lack of text highlighting means that its not as useful as it could be, and in the case of larger documents it borders on useless!

#13 GrumpyMonkey

GrumpyMonkey

  • Title: 不機嫌な猿
  • Group: Evernote Evangelist
  • 7,605 posts

Posted 10 December 2012 - 10:23 AM

Evernote OCR is nice, but I prefer to do the OCR myself. There are several reasons, but one is that I can extract the text and paste it into a note. It is then searchable, just like any text note, even offline on the iPad.

http://discussion.ev...e/#entry173506 


#14 gejkelly

gejkelly

  • Pip
  • Title: Member
  • Group: Members
  • 4 posts

Posted 12 December 2012 - 05:33 PM

Just received the following reply from support. My original question below that for clarity. Disappointing but not a show stopper in my case as many of the documents I intend to upload are smaller. Really should be solved though. I think EN needs to be careful to point out that groups of EN pdf notes are searchable but that the pdf note itself is not! Searching for a word means you find the word right?! I'm not so sure that's clear in the blurb.


Dear Valued Customer,

The highlighting is indeed a bug and there isn't a viable workaround at this time. It is something that we plan on fixing, but I don't have an estimate as to when that may be available. I'm truly sorry for the trouble. I wish I had better news for you. Please let me know if you have any questions.


My original question below:


I have many documents in PDF format. One of the reasons I went Premium was the ability of EN to search these documents. While EN finds the document/note the text is not highlighted as it would be when searching an image. This is a significant negative for me as I hope to upload a considerable number of larger PDF files for searching. From searching your forum this seems to be an issue that has not been resolved? If there isn't a work around for this please let me know if you plan to address it in future software updates. Thanks in advance.


#15 jjones

jjones

  • Pip
  • Title: Member
  • Group: Members
  • 4 posts

Posted 02 January 2013 - 02:11 AM

I'm a big fan of Evernote in general and I'm using the Premium version of Evernote for some time now with a Fujitsu ScanSnap 1500. I had great dreams of using the Evernote 'searchable PDF' option to scan in all my paper and find stuff using text search when needed.

But, despite numerous promises, Evernote have failed to deliver a PDF search that reliably highlights the searched text in the search results.

Usually I end up with one result (the first, as a rule) with correctly highlighted text, and the rest of the list either have no highlighted results, or can have all kinds of random highlight spots, often on totally blank areas.

Evernote have been aware of this problem for many months now, but it has not been addressed in any of their updates.

I'm very disappointed and frustrated by this 'Evernote Premium' behaviour.


I too am quite disappointed since now i've paid for the premium membership, only to find the main feature of interest to me, PDF searching....borders on useless. I feel as if I was misled with the generalization of "searchable PDF's"

So I have been able to determine the following facts from my short experience. I hope this helps another user in their decision to invest $45.00 into a SAS that will not meet their needs with "features" that are so generally suggested.

1) There are two types of PDF's, one type is created by scanning paper documents, and the other type is by "printing" or saving a document as a PDF. When scanning a document, and NOT using OCR EN will reject recognition if any of the following is TRUE:
  • The PDF contains more than 100 pages
  • The PDF file is more than 25MB
  • The PDF does not contain at least one "scanned" page, defined as:
    • A "scanned" page contains at least 1025 pixels of image data
    • A "scanned" page contains no more than 512 characters of regular, searchable text (e.g. this is enough for a text-based fax header or similar). PDF files that have already been processed by a separate OCR system will not satisfy this condition and will be rejected.
  • The PDF contains no more than one non-scanned page. (I.e. the doc may have one "cover" page without any image data, but if there's more than one, than it's not a real scan and we reject it.)
  • The analysis crashes or fails for some technical reason, typically due to a malformed PDF from some crazy source, or if the PDF is password protected (encrypted).
  • This analysis process takes more than 30 seconds to complete.
Assuming, none of the above is true for a particular document, EN will recognize the text and create a "searchable" (I use that term very loosely) PDF. However, in searching all notes for a particular string of text, EN will show the note which contains this PDF. If this is a multi page PDF, EN will NOT display or further filter to the first instance of the document which contains your search string. Making this particular feature more or less useless.

2)The other type of PDF, mentioned above, is created by some other software from a document. For instance a multi page Word document, saved as a PDF. A PDF of this type has a hidden layer built in that contains an index of all the text in the PDF. When importing this type of PDF into EN, EN seems to index and recognize only some of the text. So again, when searching for a sting of text from your list of notes, EN will pin down the PDF, but as before does not go to the exact page, or highlight the text. The only caveat to this type of file is that a Windows user can press CTRL-F to invoke a search box located at the bottom of the screen. You have to RE-ENTER the search string, then the first instance of the string will be found within the document. More useable, but still a LOT functionality less than I had expected.

So in summary, the PDF search feature is far from acceptable in my experience.

FYI, I am using a Windows client, and have identical functionailty on the web client, with the exception of the CTRL-F option...for the web client I have to open the PDF and use the search function to re-enter my search string.

#16 GrumpyMonkey

GrumpyMonkey

  • Title: 不機嫌な猿
  • Group: Evernote Evangelist
  • 7,605 posts

Posted 02 January 2013 - 02:24 AM

For some people, a useful workaround might be to extract the text from the PDF and put that into the Evernote note with the PDF attachment. Alternatively, you could just leave the PDF out entirely. This has several benefits. I have written more about this here (http://discussion.ev...ce/#entry173506).

Of course, this isn't for everyone, and it doesn't address the problems mentioned above, or the problems I have had (mentioned elsewhere), but it is a workaround that I have been using with great success for about three months now. I am paperless, and my Evernote database is hovering somewhere around 900 MB, with about 10,000 notes inside it.

#17 jbenson2

jbenson2

  • PipPipPipPipPip
  • Title: Browncoat
  • Group: Members
  • 4,760 posts

Posted 02 January 2013 - 02:31 AM

It takes a bit longer, but I always let ScanSnap do the OCR for me. (100% of the time)

Why?

1.) Exported PDFs:
ScanSnap: The PDF document remains OCR'd if I export it from Evernote.
Evernote: The PDF document loses its OCR if I export it from Evernote.

2.) Consistency:
ScanSnap: The search results are consistent in Evernote, whether I view them from my desktop client or the Evernote web.
Evernote: The search results are not consistent because Evernote uses different OCR software depending on the platform.

3.) 100% OCR:
ScanSnap: Works on notes that are stored in my local non-sync'd Evernote notebooks.
Evernote: Evernote cannot see my notes on my local non-sync'd notebooks, so the PDF's cannot be OCR'd.

4.) No complex difficult-to-understand rules:
ScanSnap: OCR's all my PDF's - no rules and I know it is done.
Evernote: Evernote has 5 technical rules to follow and no warning if the document fails to meet all the rules

#18 GrumpyMonkey

GrumpyMonkey

  • Title: 不機嫌な猿
  • Group: Evernote Evangelist
  • 7,605 posts

Posted 02 January 2013 - 02:57 AM

It takes a bit longer, but I always let ScanSnap do the OCR for me. (100% of the time)

Why?

1.) Exported PDFs:
ScanSnap: The PDF document remains OCR'd if I export it from Evernote.
Evernote: The PDF document loses its OCR if I export it from Evernote.

2.) Consistency:
ScanSnap: The search results are consistent in Evernote, whether I view them from my desktop client or the Evernote web.
Evernote: The search results are not consistent because Evernote uses different OCR software depending on the platform.

3.) 100% OCR:
ScanSnap: Works on notes that are stored in my local non-sync'd Evernote notebooks.
Evernote: Evernote cannot see my notes on my local non-sync'd notebooks, so the PDF's cannot be OCR'd.

4.) No complex difficult-to-understand rules:
ScanSnap: OCR's all my PDF's - no rules and I know it is done.
Evernote: Evernote has 5 technical rules to follow and no warning if the document fails to meet all the rules


For Mac users, this means Spotlight indexing. For iPad users who use my text extraction suggestion (I recommend Automator) it means offline searching on the iPad, search results highlighted on the iPad, searching within the note as well, and the ability to download your entire account and keep it offline. In fact, I would go so far as to say textifying my PDFs has opened up a whole new world of possibilities for the iPad. The first step, though, is doing it yourself, as JB recommends. Two days ago, using the multiple file function in Adobe Acrobat Pro, I had the computer finish several thousand files in the morning, so it isn't even a time consuming task to OCR yourself.

#19 jjones

jjones

  • Pip
  • Title: Member
  • Group: Members
  • 4 posts

Posted 02 January 2013 - 03:15 AM

For some people, a useful workaround might be to extract the text from the PDF and put that into the Evernote note with the PDF attachment. Alternatively, you could just leave the PDF out entirely. This has several benefits. I have written more about this here (http://discussion.ev...ce/#entry173506).

Of course, this isn't for everyone, and it doesn't address the problems mentioned above, or the problems I have had (mentioned elsewhere), but it is a workaround that I have been using with great success for about three months now. I am paperless, and my Evernote database is hovering somewhere around 900 MB, with about 10,000 notes inside it.


Thanks for the suggestion.

Unfortunatly, I am dealing with techncial documents, schematics and the like with a mixture of mechanical assembly explosions, pictures, and other resourceful images.

Raw text won't work in my case.

JJ

#20 jjones

jjones

  • Pip
  • Title: Member
  • Group: Members
  • 4 posts

Posted 02 January 2013 - 03:42 AM

It takes a bit longer, but I always let ScanSnap do the OCR for me. (100% of the time)

Why?

1.) Exported PDFs:
ScanSnap: The PDF document remains OCR'd if I export it from Evernote.
Evernote: The PDF document loses its OCR if I export it from Evernote.

2.) Consistency:
ScanSnap: The search results are consistent in Evernote, whether I view them from my desktop client or the Evernote web.
Evernote: The search results are not consistent because Evernote uses different OCR software depending on the platform.

3.) 100% OCR:
ScanSnap: Works on notes that are stored in my local non-sync'd Evernote notebooks.
Evernote: Evernote cannot see my notes on my local non-sync'd notebooks, so the PDF's cannot be OCR'd.

4.) No complex difficult-to-understand rules:
ScanSnap: OCR's all my PDF's - no rules and I know it is done.
Evernote: Evernote has 5 technical rules to follow and no warning if the document fails to meet all the rules


Ok, understood.

But here are my questions about your above suggestion/explanation:

1) Is the consistent experience you describe above, found also on the Windows client? I know there are differences between Mac and Windows clients. Which are you basing your experience from?
2) The scanner adds another several hundred dollars, when I already have a fully capable, networked, 100 page duplex ADF scanner.
3) I also have PDF's outputted by various software packages. How is your suggestion (search experience) above different than my current experience if my PDF already has a text index or layer created by the outputting software?
4) I have tried to use the OCR funtions of Acrobat 8 when scanning technical documents, but I end up with all sorts of goofy formatting. The end result of Acrobats OCR is just an unusable mess. Not to mention my documents are a mixture of English and Italian text. I think that throws off the Acrobat OCR.

I don't mind spending several hundred dollars to get the right tools together that are needed, but if the end result is still mediocre search results (notes search only gets me to the first page of a PDF), then I just outlayed some dough for nothing. Not something I like to do.

Thanks,
JJ





2 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users


    Bing (1)
Clip to Evernote