Jump to content

Evernote PDF search highlighting problem makes paperless office unworkable


pipkato

Recommended Posts

I'm a big fan of Evernote in general and I'm using the Premium version of Evernote for some time now with a Fujitsu ScanSnap 1500. I had great dreams of using the Evernote 'searchable PDF' option to scan in all my paper and find stuff using text search when needed.

But, despite numerous promises, Evernote have failed to deliver a PDF search that reliably highlights the searched text in the search results.

Usually I end up with one result (the first, as a rule) with correctly highlighted text, and the rest of the list either have no highlighted results, or can have all kinds of random highlight spots, often on totally blank areas.

Evernote have been aware of this problem for many months now, but it has not been addressed in any of their updates.

I'm very disappointed and frustrated by this 'Evernote Premium' behaviour.

Link to comment

This post makes no sense to me. I am "Premium" as well and have no problems with the search, both with typewritten text and handwritten text. Also, I've never seen it highlight a "blank area" before.

Link to comment

Thanks Adam.

From Evernote:

Ticket # 16051-80087

Dear Roy,

The PDF highlighting not aligning is currently a known issue that is being investigated. We appreciate you reporting this to us.

From user: May 10th

icoco has just posted a reply to a topic that you have subscribed to titled "BUG: Evernote not highlighting text in PDF search results".

----------------------------------------------------------------------

I have the same request. I'd certainly expect to have text or passages highlighted when performing a search in any document. This should apply to PDF documents, too. Evernote seems to find the PDF document that contains the search term, but doesn't mark the text or passage. Which mostly is quite cumbersome, especially regarding long documents like manuals etc.

Please, Evernote folks, do something about that issue.

Regards.

Link to comment
  • 2 months later...

@bill I know a bit more about this issue now and it helps a lot if you create a searchable PDF before uploading to EN. PDFs created like this produce highlights in yellow as you would expect. If its a single page PDF then you don't need to use Control-F but on a multipage there is no choice; often there is more than one occurrence on different pages, so the way this works is understandable.

Link to comment

Hello Andy.

Thanks for mentioning ScanSnap's OCR. It works for me too.

But I came to this whole 'paperless office' adventure through Brooks Duncan's excellent DocumenSnap website, and, if I remember correctly (it's many months ago now) his advice was to use Evernote's OCR to avoid having to wait for each ScanSnap document scan to be OCRed. Using Evernote's PDF OCR should theoretically allow continuous scanning, while Evernote applies its OCR in the background. And, it does work in a kind of a way. But as my original post pointed out it is completely unreliable. And despite Evernote being aware of the issue, and having issued many updates since I raised it with them, they can't seem to solve the problem.

With Evernote so excellent in many other areas, I'm puzzled by this long-standing issue. Non-PDF clippings search works perfectly for me. But, since I got my ScanSnap setup specifically for going paperless with Evernote Premium, I'm reluctant to end up going back to the slower (but effective) ScanSnap OCR.

I hope repeated complaints from users like us can eventually persuade Evernote to resolve the issue, or, at least, explain why they can't.

Link to comment

Hi pipkato, thanks for your kind words re: DocumentSnap! Are you on Mac or Windows? I might be able to come up with a workflow where the ScanSnap OCR runs in the background, and then puts it in Evernote. Then you'd get a best of both worlds sort of deal.

Link to comment

That sound great to me Brooks.

I'm on the Mac, OS X 10.8 and 10.7 on another machine.

Thanks for the offer, and I hope you can manage it .. it would be great, at least until Evernote solve the PDF OCR search problem.

And thanks again for DocumentSnap - it's a great paperless resource.

Hi pipkato, thanks for your kind words re: DocumentSnap! Are you on Mac or Windows? I might be able to come up with a workflow where the ScanSnap OCR runs in the background, and then puts it in Evernote. Then you'd get a best of both worlds sort of deal.

Link to comment
  • 3 months later...

Would just like to join the call for highlighting the search term in the PDF document. As a new user one of the reasons I went Premium was the ability to search PDFs that I have scanned over the years. I don't really want to re-scan. EN correctly finds the document/note but the lack of text highlighting means that its not as useful as it could be, and in the case of larger documents it borders on useless!

Link to comment

Just received the following reply from support. My original question below that for clarity. Disappointing but not a show stopper in my case as many of the documents I intend to upload are smaller. Really should be solved though. I think EN needs to be careful to point out that groups of EN pdf notes are searchable but that the pdf note itself is not! Searching for a word means you find the word right?! I'm not so sure that's clear in the blurb.

Dear Valued Customer,

The highlighting is indeed a bug and there isn't a viable workaround at this time. It is something that we plan on fixing, but I don't have an estimate as to when that may be available. I'm truly sorry for the trouble. I wish I had better news for you. Please let me know if you have any questions.

My original question below:

I have many documents in PDF format. One of the reasons I went Premium was the ability of EN to search these documents. While EN finds the document/note the text is not highlighted as it would be when searching an image. This is a significant negative for me as I hope to upload a considerable number of larger PDF files for searching. From searching your forum this seems to be an issue that has not been resolved? If there isn't a work around for this please let me know if you plan to address it in future software updates. Thanks in advance.

Link to comment
  • 3 weeks later...

I'm a big fan of Evernote in general and I'm using the Premium version of Evernote for some time now with a Fujitsu ScanSnap 1500. I had great dreams of using the Evernote 'searchable PDF' option to scan in all my paper and find stuff using text search when needed.

But, despite numerous promises, Evernote have failed to deliver a PDF search that reliably highlights the searched text in the search results.

Usually I end up with one result (the first, as a rule) with correctly highlighted text, and the rest of the list either have no highlighted results, or can have all kinds of random highlight spots, often on totally blank areas.

Evernote have been aware of this problem for many months now, but it has not been addressed in any of their updates.

I'm very disappointed and frustrated by this 'Evernote Premium' behaviour.

I too am quite disappointed since now i've paid for the premium membership, only to find the main feature of interest to me, PDF searching....borders on useless. I feel as if I was misled with the generalization of "searchable PDF's"

So I have been able to determine the following facts from my short experience. I hope this helps another user in their decision to invest $45.00 into a SAS that will not meet their needs with "features" that are so generally suggested.

1) There are two types of PDF's, one type is created by scanning paper documents, and the other type is by "printing" or saving a document as a PDF. When scanning a document, and NOT using OCR EN will reject recognition if any of the following is TRUE:

  1. The PDF contains more than 100 pages
  2. The PDF file is more than 25MB
  3. The PDF does not contain at least one "scanned" page, defined as:
    • A "scanned" page contains at least 1025 pixels of image data
    • A "scanned" page contains no more than 512 characters of regular, searchable text (e.g. this is enough for a text-based fax header or similar). PDF files that have already been processed by a separate OCR system will not satisfy this condition and will be rejected.

[*]The PDF contains no more than one non-scanned page. (I.e. the doc may have one "cover" page without any image data, but if there's more than one, than it's not a real scan and we reject it.)

[*]The analysis crashes or fails for some technical reason, typically due to a malformed PDF from some crazy source, or if the PDF is password protected (encrypted).

[*]This analysis process takes more than 30 seconds to complete.

Assuming, none of the above is true for a particular document, EN will recognize the text and create a "searchable" (I use that term very loosely) PDF. However, in searching all notes for a particular string of text, EN will show the note which contains this PDF. If this is a multi page PDF, EN will NOT display or further filter to the first instance of the document which contains your search string. Making this particular feature more or less useless.

2)The other type of PDF, mentioned above, is created by some other software from a document. For instance a multi page Word document, saved as a PDF. A PDF of this type has a hidden layer built in that contains an index of all the text in the PDF. When importing this type of PDF into EN, EN seems to index and recognize only some of the text. So again, when searching for a sting of text from your list of notes, EN will pin down the PDF, but as before does not go to the exact page, or highlight the text. The only caveat to this type of file is that a Windows user can press CTRL-F to invoke a search box located at the bottom of the screen. You have to RE-ENTER the search string, then the first instance of the string will be found within the document. More useable, but still a LOT functionality less than I had expected.

So in summary, the PDF search feature is far from acceptable in my experience.

FYI, I am using a Windows client, and have identical functionailty on the web client, with the exception of the CTRL-F option...for the web client I have to open the PDF and use the search function to re-enter my search string.

Link to comment
  • Level 5*

For some people, a useful workaround might be to extract the text from the PDF and put that into the Evernote note with the PDF attachment. Alternatively, you could just leave the PDF out entirely. This has several benefits. I have written more about this here (http://discussion.evernote.com/topic/29245-how-to-optimize-your-evernote-experience/#entry173506).

Of course, this isn't for everyone, and it doesn't address the problems mentioned above, or the problems I have had (mentioned elsewhere), but it is a workaround that I have been using with great success for about three months now. I am paperless, and my Evernote database is hovering somewhere around 900 MB, with about 10,000 notes inside it.

Link to comment
  • Level 5

It takes a bit longer, but I always let ScanSnap do the OCR for me. (100% of the time)

Why?

1.) Exported PDFs:

ScanSnap: The PDF document remains OCR'd if I export it from Evernote.

Evernote: The PDF document loses its OCR if I export it from Evernote.

2.) Consistency:

ScanSnap: The search results are consistent in Evernote, whether I view them from my desktop client or the Evernote web.

Evernote: The search results are not consistent because Evernote uses different OCR software depending on the platform.

3.) 100% OCR:

ScanSnap: Works on notes that are stored in my local non-sync'd Evernote notebooks.

Evernote: Evernote cannot see my notes on my local non-sync'd notebooks, so the PDF's cannot be OCR'd.

4.) No complex difficult-to-understand rules:

ScanSnap: OCR's all my PDF's - no rules and I know it is done.

Evernote: Evernote has 5 technical rules to follow and no warning if the document fails to meet all the rules

Link to comment
  • Level 5*

It takes a bit longer, but I always let ScanSnap do the OCR for me. (100% of the time)

Why?

1.) Exported PDFs:

ScanSnap: The PDF document remains OCR'd if I export it from Evernote.

Evernote: The PDF document loses its OCR if I export it from Evernote.

2.) Consistency:

ScanSnap: The search results are consistent in Evernote, whether I view them from my desktop client or the Evernote web.

Evernote: The search results are not consistent because Evernote uses different OCR software depending on the platform.

3.) 100% OCR:

ScanSnap: Works on notes that are stored in my local non-sync'd Evernote notebooks.

Evernote: Evernote cannot see my notes on my local non-sync'd notebooks, so the PDF's cannot be OCR'd.

4.) No complex difficult-to-understand rules:

ScanSnap: OCR's all my PDF's - no rules and I know it is done.

Evernote: Evernote has 5 technical rules to follow and no warning if the document fails to meet all the rules

For Mac users, this means Spotlight indexing. For iPad users who use my text extraction suggestion (I recommend Automator) it means offline searching on the iPad, search results highlighted on the iPad, searching within the note as well, and the ability to download your entire account and keep it offline. In fact, I would go so far as to say textifying my PDFs has opened up a whole new world of possibilities for the iPad. The first step, though, is doing it yourself, as JB recommends. Two days ago, using the multiple file function in Adobe Acrobat Pro, I had the computer finish several thousand files in the morning, so it isn't even a time consuming task to OCR yourself.

Link to comment

For some people, a useful workaround might be to extract the text from the PDF and put that into the Evernote note with the PDF attachment. Alternatively, you could just leave the PDF out entirely. This has several benefits. I have written more about this here (http://discussion.ev...ce/#entry173506).

Of course, this isn't for everyone, and it doesn't address the problems mentioned above, or the problems I have had (mentioned elsewhere), but it is a workaround that I have been using with great success for about three months now. I am paperless, and my Evernote database is hovering somewhere around 900 MB, with about 10,000 notes inside it.

Thanks for the suggestion.

Unfortunatly, I am dealing with techncial documents, schematics and the like with a mixture of mechanical assembly explosions, pictures, and other resourceful images.

Raw text won't work in my case.

JJ

Link to comment

It takes a bit longer, but I always let ScanSnap do the OCR for me. (100% of the time)

Why?

1.) Exported PDFs:

ScanSnap: The PDF document remains OCR'd if I export it from Evernote.

Evernote: The PDF document loses its OCR if I export it from Evernote.

2.) Consistency:

ScanSnap: The search results are consistent in Evernote, whether I view them from my desktop client or the Evernote web.

Evernote: The search results are not consistent because Evernote uses different OCR software depending on the platform.

3.) 100% OCR:

ScanSnap: Works on notes that are stored in my local non-sync'd Evernote notebooks.

Evernote: Evernote cannot see my notes on my local non-sync'd notebooks, so the PDF's cannot be OCR'd.

4.) No complex difficult-to-understand rules:

ScanSnap: OCR's all my PDF's - no rules and I know it is done.

Evernote: Evernote has 5 technical rules to follow and no warning if the document fails to meet all the rules

Ok, understood.

But here are my questions about your above suggestion/explanation:

1) Is the consistent experience you describe above, found also on the Windows client? I know there are differences between Mac and Windows clients. Which are you basing your experience from?

2) The scanner adds another several hundred dollars, when I already have a fully capable, networked, 100 page duplex ADF scanner.

3) I also have PDF's outputted by various software packages. How is your suggestion (search experience) above different than my current experience if my PDF already has a text index or layer created by the outputting software?

4) I have tried to use the OCR funtions of Acrobat 8 when scanning technical documents, but I end up with all sorts of goofy formatting. The end result of Acrobats OCR is just an unusable mess. Not to mention my documents are a mixture of English and Italian text. I think that throws off the Acrobat OCR.

I don't mind spending several hundred dollars to get the right tools together that are needed, but if the end result is still mediocre search results (notes search only gets me to the first page of a PDF), then I just outlayed some dough for nothing. Not something I like to do.

Thanks,

JJ

Link to comment
  • Level 5*

I use the text for searching (not reading), and for images and the like, I'll open up the corresponding file in Dropbox. The OCRd text is, at best, a guide to find the approximate location of information you want to see. If you expect the text to turn out perfectly, especially with mixed languages (Adobe doesn't handle it well), then you are bound to be disappointed. And, if Adobe is having trouble, I'd say Evernote cannot be expected to be perfect either. It's too bad, but that is where things stand.

I have mainly Chinese/Japanese texts, and the vertical writing is hopeless with search highlighting. When the test is horizontal, it helps, but not much. This is also partly due to the poor OCR, but even wrong, it gets enough characters right to be a helpful aid.

Link to comment

I use the text for searching (not reading), and for images and the like, I'll open up the corresponding file in Dropbox. The OCRd text is, at best, a guide to find the approximate location of information you want to see. If you expect the text to turn out perfectly, especially with mixed languages (Adobe doesn't handle it well), then you are bound to be disappointed. And, if Adobe is having trouble, I'd say Evernote cannot be expected to be perfect either. It's too bad, but that is where things stand.

I have mainly Chinese/Japanese texts, and the vertical writing is hopeless with search highlighting. When the test is horizontal, it helps, but not much. This is also partly due to the poor OCR, but even wrong, it gets enough characters right to be a helpful aid.

Kind of odd that my replies to posts are limited, when I have an active issue and responding to multiple posts/suggestions.

I too use the text for searching, not for reading. I need to find a short (7 character or less) text string-which relates to an expoded view of an assembly. The exploded view is either 1 page ahead, or sometimes one page behind the text that was searched.

My issue is two fold.

1) I have raw scans going into EN, that because of the file rules, or the inability of EN to OCR properly, remain totally unsearchable. So, they are not better than their paper counterparts at this point, except for portability. Mind you, the quality of the print being scanned is ideal and clean and crisp. Black, white and some grey scale. I have even tried making the smaller PDF's into .JPG, just to see if the EN OCR will identify well enough to be workable, again not as expected. Failure of text recognition of .jpg of black on white print text is at least 75%

2) I have digitally created PDF's with a text layer, but those also are not well handled in EN due to it's limitation of being able to pinpoint a search string to a page within the PDF.

I really wish I knew the PDF related limitations of EN before investing so much time trying to figure this out.

I can (and have) combined a good bit of my digital PDF's into a single PDF. By doing so, I open the "BIG" PDF, enter a search string, and bam...I can bang through each instance right then and there.

Unfortunatly, I have a significant amount of non-digital information to digitize, catalog and make somewhat reliably searchable.

Thanks for your help and suggestions!

JJ

Link to comment

I too am surprised by the fact that the premium account doesn't fulfill expectations, while there is so much potential.

Using the scansnap s1500, my document workflow has become way easier than it was, but I scan everything into evernote with the same intention to one time be able to find it back.

Everything I scan is in PDF, and the results of some random searches in evernote are disappointing. Most of the time, it produces results (but not all of them), but then I still don't know where to look. This evening I tried to compare health insurance policies from different vendors, but I gave up since searching within the 20 paged PDF was impossible.

Now, I could use the OCR of the scansnap software, but unfortunately the primary language of all of my scanned documents is Dutch (as am I), and that doesn't seem to be supported in the software. So, useless option for me.

Can Evernote maybe give more insight into when this bug / feature will be available?

Link to comment
  • Level 5*

I too am surprised by the fact that the premium account doesn't fulfill expectations, while there is so much potential.

Using the scansnap s1500, my document workflow has become way easier than it was, but I scan everything into evernote with the same intention to one time be able to find it back.

Everything I scan is in PDF, and the results of some random searches in evernote are disappointing. Most of the time, it produces results (but not all of them), but then I still don't know where to look. This evening I tried to compare health insurance policies from different vendors, but I gave up since searching within the 20 paged PDF was impossible.

Now, I could use the OCR of the scansnap software, but unfortunately the primary language of all of my scanned documents is Dutch (as am I), and that doesn't seem to be supported in the software. So, useless option for me.

Can Evernote maybe give more insight into when this bug / feature will be available?

Hi. Welcome to the forums. I recommend doing OCR separately (I use Adobe Acrobat Pro), extracting the text, and putting that into Evernote instead (or in addition to) the PDFs. I have written about this elsewhere, so won't belabor the point here.

As far as Dutch goes, you may want to take a look at your settings (Settings > Personal Settings > Recognition Language) on the Evernote website (www.evernote.com), because you sometimes have to adjust it to meet your own use case. For me, I use Japanese, Chinese, and English with a smattering of Portuguese, German, and Spanish in the account. I cannot have it OCR PDFs for all of these languages, and I doubt it could figure out which language to use on its own (some PDFs have threee or more languages in them), so I have to pick the one the option that is most useful for me; Japanese + English. Of course, I do my own OCR, so this is more of a "just in case" setting than one I actually need.

Link to comment

Hi. Welcome to the forums. I recommend doing OCR separately (I use Adobe Acrobat Pro), extracting the text, and putting that into Evernote instead (or in addition to) the PDFs. I have written about this elsewhere, so won't belabor the point here.

As far as Dutch goes, you may want to take a look at your settings (Settings > Personal Settings > Recognition Language) on the Evernote website (www.evernote.com), because you sometimes have to adjust it to meet your own use case. For me, I use Japanese, Chinese, and English with a smattering of Portuguese, German, and Spanish in the account. I cannot have it OCR PDFs for all of these languages, and I doubt it could figure out which language to use on its own (some PDFs have threee or more languages in them), so I have to pick the one the option that is most useful for me; Japanese + English. Of course, I do my own OCR, so this is more of a "just in case" setting than one I actually need.

Hi, thanks for your warm welcome here.

Doing OCR in Acrobat would suffice in terms of functionality (Acrobat does Dutch OCR without a problem), but it would severely break the automated workflow in which my documents go into evernote (Press scan button, give a file name and choose a "watched folder", done all within one window). I've already put the Evernote language to the correct one, so recognition is better, but that doesn't solve the inline searching of PDF's.

As for extracting text and putting that in Evernote, I want to keep the document as original as possible including images etc. since eg. the tax authority requires "original" invoices.

By the way, I found I can lessen the pain a bit by not choosing the "scan to folder" option, but the "scan to searchable PDF" option - which feeds the scan through Abbyy Finereader (which indeed seems to do Dutch OCR as well). This results in an OCRed file on my drive which I then manually need to drop in one of the dedicated folders which are being watched by Evernote (I have a folder for administration, work, receipts and business cards - evernote puts them in their respective notebooks automatically).

Two drawbacks to this:

1. an extra manual step to move the files. If I tell Abbyy Finereader to directly put it in the right directory, it first makes the scanned PDF (which evernote then auto-imports), then it does its OCR and saves a new file with "OCR" at the end (which Evernote also auto-imports). Then it deletes the first file (if I chose that option), but that doesn't matter anymore since Evernote already imported it. So: need to save outside of the watched folders and manually drag&drop them after OCR has been applied.

2. The "save to folder" window of the scansnap manager has a nice preview with zoom control, which I use to see eg. what date the document is (so I can put in the correct filename). This is missing from the Abbyy solution.

Now I know that these things aren't about the Evernote application, but truth is that if Evernote's OCR would work "out of the box" for my language, this whole process would be much faster and more convenient.

Link to comment
  • 5 months later...

Just bought Scanner Pro for iPhone, does a great job of producing a PDF. Same issue as above though, if I search for text that is in this PDF, Evernote finds the PDF fine, but does not highlight where the text is in it. I assume this is still an Evernote issue.

Link to comment
  • 8 months later...
  • 8 months later...

Same thing here.  Searching in a JPG file seems to work okay, but I use PDF documents for everything. If Evernote gave came through with a feature that could highlight the areas of a PDF document that match your search criteria, it would be a huge boost to its usability for just about everyone using the software. 

Link to comment
  • 1 month later...
  • 3 months later...

Well, it's March 15, 2015 and this is still an issue. I have documents that I *know* are in Evernote but are not returned to me in even simple, one word searches on my Mac client. Interestingly, using the web-based Evernote client seems to return more accurate search results.

 

I've been a premium member for coming on 5 years now and am somewhat dismayed that something like "Work Chat" is so aggressively shoved down my throat but basic PDF searching is still an issue.  Before we embark on even more "gee whiz" features, can we please get the basics nailed down across all the supported platforms?

 

I have another ticket open right now where Evernote occasionally forgets to index swaths of my notes and I need to ping support to reset my account to try to sweep up the missed items. It's a bit scary for me that there are things I've put into Evernote that all but the most targeted search may never find again.

 

Search is Evernote's core competency, please work to maintain that competency!

Link to comment

Problem with me too. I already have a scanner that I use for pages form journals or books - Scansnap won't do this. 

I've been scanning as JPG where possible, but that's a less than ideal solution.

I have a premium account, but it's obvious Evernote is after the business customers, with all their excitement over new Workchat features. Features I couldn't care less about.

Link to comment
  • 4 weeks later...

Hi guys,

 

We have a bug filed for this issue and are actively working on it. For those who are still experiencing this issue, please open up a support ticket via https://evernote.com/contact/support/ and provide as much information as possible. We'll go ahead and relate your tickets plus the additional information to what we have on file. 

 

Thank you! 

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...