Jump to content

Recommended Posts

I am scanning with SnapScan to Evernote as searchable PDFs. Evernote 10 for Windows is not returning any of my new scans during searches. I have reinstalled 6.25, and that version continues to work as expected. The legacy version will also return the new scans in its search result. Strangely, Evernote 10 will return matching historical PDF files during a search. I agree that PDF searching is critical, and this problem is a show stopper for me.

Link to comment
On 11/15/2020 at 11:51 AM, joseluisl said:

I upload a PDF file to an evernote note and "Find in a note" does not find any of the words in the note. Attached one of the PDFs from Datacamp.

Why is this happening? 

 

Thanks in advance, 

Jose Luis

Multivariate & Generalized Linear Models.pdf 2.95 MB · 1 download

Hi, I downloaded this file and imported into an test note (windows Evernote version 10.4.3).
It took about . +/- 20 minutes before search finds text in this PDF...

Maybe the new version resolves this issue?

Edited by ArjenC
r
Link to comment
  • Evernote Staff*

Thanks everyone for these reports. After the scans are saved to Evernote it will take time for them to be indexed and searchable. Are the scanned PDFs searchable if try searching them again now using version 10? Thanks. 

Link to comment

I am sorry but this is not the case. Take a look at the attached picture. 

I look for "The prior" but Evernote is unable to find the text in the first sentence. 

I think the issue is not with the search but with the way that Evernote indexes the text. It seems that Evernote adds one weird character between characters.

1482943035_Evernotenotfindinginnote.thumb.png.67c337ef3f192b0365c2ae8685fdb6ba.png

Link to comment
  • Level 5
On 11/15/2020 at 11:51 AM, joseluisl said:

I upload a PDF file to an evernote note and "Find in a note" does not find any of the words in the note. Attached one of the PDFs from Datacamp.

Why is this happening? 

Just a few thoughts - don't know whether they are the cause:

  1. If you move the mouse over the title, a box appears. The title text reaches outside of that box. The text is not a picture, it is an embedded text.
  2. Logo
  3. White text on a color background, maybe no OCR
  4. Rare font

My experience is that OCR on complicated documents work better when working on a JPEG than a pdf. Maybe this document was not OCRed at all by EN: When a pdf document already comes with a text layer, EN usually does not OCR it again. If it OCRs, depending on server load it sometime takes some hours until the OCR is done.

1stPage.thumb.png.cdb8dc9095769342c0f047a9186b2a33.png

 

 

 

 

 

 

Link to comment

you're right It finds only one / first letter... If I search de letter A it finds 600+ hits but Al (or Alicia) it finds none.

When I search globally (all notes) it finds Alicia within my test note...

image.thumb.png.acc38efaefe4fecb09301191597ef797.png

Edited by ArjenC
screenshot added
Link to comment
  • Level 5

Just imported it into a new note in EN v10 iOS.

Only a few minutes later searches for „Poisson“ and „rjags“ produced this very pdf a search hit. 

The iOS search is server based. From this perspective I see neither a search problem nor any OCR issue (because the document already has text) with EN. It just behaves as it should.

Link to comment
  • Level 5

Tried the same in legacy on my Mac. This pdf is a weirdo. Search works on some words, it doesn't on most others.

When I open it in PDF Expert (that can edit pdfs as well as viewing them), there is an offset between the text as shown and the text field that serves as a container for the text. Text can be copied to another app (here a plain text editor) without any problem. Search in PDF Expert works, finding words that were not found in the EN search.

Since EN search is working on other notes just as I am used to it (very fast & precise), I think there must be something with that document that makes EN look away. No idea what that may be.

Link to comment

I have also confirmed that the first letter search is working, but once I attempt to search for two or more consecutive letters, Evernote 10 Windows and Web do not return any PDF documents that I have scanned after December of 2018.  Note that the legacy version of Evernote for Windows is still working as expected.  To confirm, has there been any changes to the level of service required for searching PDFs created and OCR'd with a ScanSnap scanner?  I have a Plus subscription.

 

Link to comment

What is weird is not the document but the way that Evernote indexes it. 

Another example below:

Windows pdf reader finds the text in the title

image.thumb.png.d7a60ba604f03d3df193fa00f0638389.png

 

But Evernote is incapable of finding anything:

image.thumb.png.7d4cd32b4ed74db0d76a9d6426c570a8.png

 

I am using on Windows 10 the following version of Evernote (just in case it helps)

v 10.3.7 build 2018 public
Editor: v111.0.14414
Service: v1.22.6
© 2019 - 2020 Evernote Corporation. All rights reserved

Multivariate & Generalized Linear Models.pdf

Link to comment
  • Level 5

@Austin G For a Test I uploaded the document yesterday into EN. My test today with searches showed that the search in v10 iOS does find text pieces in the document. With 5 words tested all were found.However, the first letter of any word must be included for hits.

So far, so good.

What is a big handicap with any document of this size: The in-document-search is still not possible with v10. Now manually review more than 60 pages in search for a specific word.

Link to comment

@Austin G my experience is different from that of @PinkElephant.  Using @joseluisl PDF document (above) illustrates the problem I am experiencing.  Evernote 10 for Windows and Web will not return the document in the search results.  Start by trying to find the work "Bayesian."  Next, search for "prior."  The document should be returned by both searches; it is not.  Interesting, a Find within Note will find the word "prior" but will not find the word "Bayesian."  Thanks in advance for your help.

Link to comment

@PinkElephant have you tried to look for a word in the title? My Evernote does not index any of the titles (deck or slides) and most of the text. In the document that I attached, Evernote hardly indexes 20% of the text. It is not matter of finding strings... it is matter of finding all the strings

I suggest the following: try to find the sentence poisson regression in the document. It appears 11 times but Evernote shows no results. Does your Evernote find them? 

Thanks in advance. 

image.thumb.png.1794346f3dd3749953723247e1f55633.png

 

Opened with Microsoft Edge. 

image.thumb.png.3b20004bda4dae49c9caf4a4ae44f0d7.png

Link to comment
  • Level 5

My searches do find them all ... but it depends on the tool I use:

What did I do ? I imported the pdf into EN, gave it a name, made sure it got synced, and waited. Further I copied it to my Mac, desktop.

Then I tested with a set of words from the pdf: RJAGS, rail-trail, likelihood, overdispersion, weekday, Johnson, dnorm, dpois.

The last 2 words are from the "boxes" with code inserted into the pdf, the rest is from the general text content.

  • Search on the Mac, legacy client: Only finds the 2 words from the code boxes ! 
  • Search on the iPhone, iOS EN v10: After giving time for the server to work on the pdf, all 8 words are found.
  • Spotlight on the Mac : Finds all 8 words.

So search is off only on the legacy client. I don't know why - what I know is that EN stopped any development of this client. So if there is something to fix, they probably won't fix it.

Link to comment
1 hour ago, PinkElephant said:

My searches do find them all ... but it depends on the tool I use:

What did I do ? I imported the pdf into EN, gave it a name, made sure it got synced, and waited. Further I copied it to my Mac, desktop.

Then I tested with a set of words from the pdf: RJAGS, rail-trail, likelihood, overdispersion, weekday, Johnson, dnorm, dpois.

The last 2 words are from the "boxes" with code inserted into the pdf, the rest is from the general text content.

  • Search on the Mac, legacy client: Only finds the 2 words from the code boxes ! 
  • Search on the iPhone, iOS EN v10: After giving time for the server to work on the pdf, all 8 words are found.
  • Spotlight on the Mac : Finds all 8 words.

So search is off only on the legacy client. I don't know why - what I know is that EN stopped any development of this client. So if there is something to fix, they probably won't fix it.

Could you search the words "poisson regression" for example?

I am using the last version for windows at it does not find anything in the titles of the slides :( 

Link to comment
7 hours ago, PinkElephant said:

Search within attachments is not available in the current release of EN v10. It is said to return, no timeline.

Not entierly true, it finds some words within attachments.. it finds poisson 4x but not Poisson Regression in the title. So it works kind of....

image.thumb.png.c83214ef6adbdc80c2b4120ac24ddd0e.png

Link to comment

As far as I am concerned the global search index of the latest EN versions is broke. On mac (10.4.4), iOS as well as windows

I tested with a scanned PDF document. (see 1st attachments)

1. the document is indexed as it can be search by any pdf viewer using the search term shell (1nd image) 

2. I can use in-document-search of evernote to find it (2nd image)

3. However, and that is critical!! EN does NOT find shell when I use the global search (3rd image)

 

- The document was uploaded to evernote days ago (so it should be in my global search index)

- It also cannot be found with the latest iOS and windows version of EN,

- A colleague has tested this on a older EN version for windows and it still works

 

This behaviour is not just for this document but for all documents scanned and saved in EN.  

IM_Dokument_03_09654_pdf__page_1_of_2__and_Profile__2__pdf__page_1_of_2_.jpg

Search_Results_-_Evernote.jpg

Search_Results_-_Evernote.jpg

IM_Dokument_03.09654.pdf

  • Thanks 1
Link to comment

This is painful and baffling and apparently inconsistent by user or platform?

For me, V10.4.4 isn't finding any text in certain types of PDFs. It doesn't see "comments" added via Foxit. It doesn't see content or titles of PDFs that have locked down security settings. I'm not exactly sure what restrictions have been placed on the PDFs in question, but I know from previous tinkering that I can't make any changes to them in any PDF editor I've tried.

I receive emails with the PDFs as auto-FWed emails from my server to EN, where I log them, EN indexes them and I can find them whenever I need to share the docs (which is constantly). There is a unique ID number that is printed as text in the PDF doc, is part of the PDF title, is part of the email subject and body. When I search it in EN now I get zero results. The original emails are not encrypted. When I find the note in EN (manually... ugh) and try "find within note" it only finds the searched ID number in the original email body. That number is also in an excel where I track this stuff. EN does not find the number in the excel - or the existence of the note itself where that excel lives, via search. It's not found via "find within note" either.

A "find within note" search for a word in a non-security tweaked PDF, that also arrived via email, finds the word I'm looking for in the PDF and the note body. It doesn't find the word in the PDF title. General search does find the note, but never highlights the word anywhere.

I want to find work-arounds for this version and just use the dratted thing, but I think I just demonstrated that I can't do my job until I reinstall the legacy version. Unless anyone has any other workarounds?

Link to comment

I'm not sure if there is a difference in the security settings of some of my unique ID number email attachments, but I just found one that, while general search still won't find the note, "find in note" does find the ID number in the body of the PDF and the body of the note - but still not in the title of the note (email subject) or PDF title.

Link to comment

I did more tinkering and discovered that yes, all my above mentioned searches work as expected in Legacy, which I now have installed side by side with v10.4.4. I can also confirm that it's happening with PDFs that are newly added, a month old, 6 months old and several years old. In my case EN doesn't appear to be indexing the PDF titles, among other sins. I still wonder if it has something to do with security settings on the PDFs themselves, but the PDFs of different ages have different settings. Also missing is the same number string which is in an Excel in EN.

Unfortunately, I just found out that the Android version (I have whatever the latest is for Android 8.0 - and I read somewhere on here that only Android 10.0 and above would be getting the latest EN updates) - doesn't find my number string either. It used to, but I can't say when that changed.

I've opened a ticket with support, as I'm sure many of you have, so we'll see where that gets me.

  • Like 1
Link to comment

My issue has resolved itself and I feel like I should have figured it out already, based on what @ArjenCsaid above. My unique ID number actually starts with a letter. My brain tells me it's a number, and EN used to let me search a partial string, but now, apparently the letter makes it a word. If the string is "M20.24599" I will find my note (the instances in the title, note/email body, pdf and the note with the excel doc) when I type that in exactly. Before v10 I could search "24599" and find any instance of that partial string.

It's frustrating that this seems to be a fundamental change to the way search works, but the fault for not finding my notes is my own in this case.

Link to comment
  • 2 months later...

Are people still having this problem? I searched for a word and the number of notes containing pdfs is different between legacy (9 notes)  and V10 (4 notes). These are not indexed - the text is selectable and findable in Adobe Reader. I can't see an obvious pattern.  The four V10 finds are not a subset of nine legacy notes. V10 found one note that legacy didn't. Generally it does seem to be larger pdf files that are not found but this not always true. Based on a sample the results seem genuine - ie the text does contain the word.

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...