Jump to content

EN 10 misses search in pdf-files


Recommended Posts

If I add a searchable pdf to a note,  I should be able to find the note by content. 

With EN 6.x (legacy)  it works fine,  I just checked.   

In EN 10 10.63.4 the search-request completely misses all content of a pdf-file. 

I just checked with EN 6 and I find my file in a note, perfectly fine.

I repeat the test in EN 10 and it misses this file.

My conclusion is that EN 10 is still not a trustworthy replacement for EN 6.x  

  • Thanks 1
Link to comment
  • Level 5*
11 minutes ago, Joost Rongen said:

If I add a searchable pdf to a note,  I should be able to find the note by content. 

With EN 6.x (legacy)  it works fine,  I just checked.   

In EN 10 10.63.4 the search-request completely misses all content of a pdf-file. 

I just checked with EN 6 and I find my file in a note, perfectly fine.

I repeat the test in EN 10 and it misses this file.

My conclusion is that EN 10 is still not a trustworthy replacement for EN 6.x  

How long are you waiting after installing v10?  It takes time to download the database and generate a new index. If adding a new PDF it still takes a little time to index . PDF search is working fine for me.

  • Like 1
Link to comment
1 hour ago, s2sailor said:

How long are you waiting after installing v10?  It takes time to download the database and generate a new index. If adding a new PDF it still takes a little time to index . PDF search is working fine for me.

Well,  the PDF-file still not indexed I uploaded a week ago.   And the funny thing is that it's working perfectly fine in EN 6.x  legacy.  I still have both versions  on 2 Windows virtual machines and EN for Mac.  

Link to comment
  • Level 5

No problem on my side.

I am not aware that it is somewhere visible if an indexing is done or not. And I am not sure whether the „machine“ doing the indexing is a joint operation, for both databases, or if each database runs its own OCR and indexing services.

Server side operations are sort of a black box.

  • Like 1
Link to comment
  • Level 5*
4 minutes ago, Joost Rongen said:

Well,  the PDF-file still not indexed I uploaded a week ago.   And the funny thing is that it's working perfectly fine in EN 6.x  legacy.  I still have both versions  on 2 Windows virtual machines and EN for Mac.  

I’m not surprised that v6 is working fine.  You have been using it for a while, I assume, and you have a stable installation.  What you are experiencing with v10 is not expected or typical.  It’s possible that there is something wrong with your installation.  Have you tried completely removing it with something like Appcleaner and then reinstalling on your Mac?  Also, I’m not sure if dual installations are recommended anymore.  Maybe the note conversions back and forth are somehow affecting this.

  • Like 1
Link to comment
2 hours ago, s2sailor said:

I’m not surprised that v6 is working fine.  You have been using it for a while, I assume, and you have a stable installation.  What you are experiencing with v10 is not expected or typical.  It’s possible that there is something wrong with your installation.  Have you tried completely removing it with something like Appcleaner and then reinstalling on your Mac?  Also, I’m not sure if dual installations are recommended anymore.  Maybe the note conversions back and forth are somehow affecting this.

Hi,  i don't have dual installations on one operating system.  I just have two separate virtual machines (Windows 10)    And, I have EN 10 installed on MacOS.   Only EN 6.x legacy on Windows is working as expected.   And, if I upload a pdf (with textlayer of course)  within EN 10,  then I see it in EN 6 as perfectly well indexed.  In any EN 10 version the document cannot be found by searching on content anymore.  It's just not working.

And,  to answer your question:  I installed EN 10 on a clean Virtual Machine.   

As I said,  EN 10 is not trustworthy at the moment.   I must keep a EN 6 installation to at least find my stuff on index.

Link to comment
  • Level 5

Currently legacy and v10 means 2 different databases - and I mean absolutely different, not one being the mirror of the other. To keep both in sync, a conversion software needs to grab changes on one side, and convert it to the structure used by the other. This happens on the server - the new clients may play a role as well, just legacy sits dumb in its box and knows of nothing. When it hits new content it doesn’t understand (like tasks), it shrugs and displays „unknown data format“.

Everybody who runs v10 alongside to legacy must know that all traffic between the two needs to pass through the described process on the server. It means as well you intentionally run a deprecated software and allow it full access to the database holding your notes.

We all have our own experience. I can’t reproduce what you tell in terms of v10 problems regarding data integrity. Since mid of last year I uninstalled legacy and stopped using it completely. I never looked back, and had no reason to.

It may be that v10 caused a problem with your content. But it may be either that you shot yourself into the foot by mixing current and deprecated software, and now watch out to find the shooter, while you still hold the smoking gun in your own hand.

I don’t think this can be decided by watching from the outside. But it’s obvious there is more than one explanation to what you observe. We can expect that syncing of legacy will stop - then the equation gets simplified, both for operating the servers and for analyzing all sort of problems. Wait and see !

 

  • Like 1
Link to comment
3 hours ago, Joost Rongen said:

Well,  the PDF-file still not indexed I uploaded a week ago.   And the funny thing is that it's working perfectly fine in EN 6.x  legacy.  I still have both versions  on 2 Windows virtual machines and EN for Mac.  

Are you a free user? (Indexed)  PDF search isn’t supported on a free plan (in V10)

Free legacy was able to search in searchable PDFs, but this was more a bug they weren’t aware of: officially it was never supported.

  • Like 1
Link to comment
  • Level 5*
1 hour ago, Joost Rongen said:

As I said,  EN 10 is not trustworthy at the moment.   I must keep a EN 6 installation to at least find my stuff on index.

That is your experience and I won't debate the point but as I mentioned earlier, it is not the expected or typical behavior and Legacy has a limited life so you may want to try and sort this out or look for a plan B.  Many users will load V10 for the first time and then immediately test it out and complain about notes not loading, search isn't correct, and many other issues.  They basically did not give the installation enough time to complete in the background.  It does sound like you are past that so support is probably your next best step.

  • Like 1
Link to comment
  • Level 5

I have the same problem. I use Evernote 10 for a very long time. I am on the personal plan, which contains search in PDFs and other document types. AI search tells me that it cannot find an information in my notes if this information is in a PDF, e.g. a user manual.

The standard search sometimes finds notes with PDFs that contain the information, but if the note contains more than one PDF, there is no clue in which PDF the information is. Very often it also shows notes which don't contain the information at all.

All in all, Evernote search in Evernote 10 is very unreliable. Therefore, I more frequently use tags to find specific information.

Link to comment
  • Level 5

EN search works for me - that's it, I need not say more.

OK, will do:

First there is a certain fuzzy ness in search. This is especially true for OCRed PDFs . If the algorithm is not absolutely sure, it will create guesses. For one recognized word there may be several entries into the search index, each of it a possible term. This can lead to false positives (found where nothing is to be found) as well as to false negatives (not found, but it's there).

To improve OCR results, check in your account settings how many languages - and which - you have set. If it's a wrong language, it can be wildly off. If it's several, the results get less precise.

AI search is an enigma. It preselects notes based on the search, and then sends it to a 3rd party service, where these notes (and only these) are processed further. It all depends on the initial selection.

To find a search string in a note with several attachments, search the note first. Now you know the searched term is somewhere, but where in (say) several attachments plus note text ?

Now select the note, and choose Search & Replace from the Note menu. Enter the search term in the search field of the little popup box. You will get an indication how often it was found, including all attachments. Now you can cycle through the search by clicking on the arrows. It will jump from occurrence to occurrence, be they in one of the documents, or in several. It will open the attachment, the relevant page will show, with the search term highlighted.

Link to comment
  • Level 5
25 minutes ago, PinkElephant said:

EN search works for me - that's it, I need not say more.

OK, will do:

First there is a certain fuzzy ness in search. This is especially true for OCRed PDFs . If the algorithm is not absolutely sure, it will create guesses. For one recognized word there may be several entries into the search index, each of it a possible term. This can lead to false positives (found where nothing is to be found) as well as to false negatives (not found, but it's there).

To improve OCR results, check in your account settings how many languages - and which - you have set. If it's a wrong language, it can be wildly off. If it's several, the results get less precise.

AI search is an enigma. It preselects notes based on the search, and then sends it to a 3rd party service, where these notes (and only these) are processed further. It all depends on the initial selection.

To find a search string in a note with several attachments, search the note first. Now you know the searched term is somewhere, but where in (say) several attachments plus note text ?

Now select the note, and choose Search & Replace from the Note menu. Enter the search term in the search field of the little popup box. You will get an indication how often it was found, including all attachments. Now you can cycle through the search by clicking on the arrows. It will jump from occurrence to occurrence, be they in one of the documents, or in several. It will open the attachment, the relevant page will show, with the search term highlighted.

Thank you for the reply. I have set my languages to German and English. These are the only ones I am using.

AI search is still very unsuccessful. For example, I asked how a certain feature of my camera works. Result: No note contains information about this feature. However, when I ask in which chapter of the manual of my camera I can find an information about the feature, it tells me the correct chapter. When I then ask how this feature in chapter xxx of the manual of my camera works, it answers correctly. That is strange and makes the search pretty much useless.

As far as the search in more than one PDF in a note is concerned, it finds notes which contains the information in one of the PDFs. But when I use the search within the note, it shows no result when I display the PDFs as title only. If I display them as one page or showing all pages, it shows the number of occurrences without any indication in which note, and it does not jump to the occurrences in the note. The only way is to open the PDFs one by one in the Mac preview app and search for the text outside Evernote. This happens with scanned PDFs as well as PDFs containing an OCR text layer and therefore its own index. So for me, Evernote search is terrible.

 

Link to comment
  • Level 5

About AI search, I have tested it, and currently see no use case for me. OK, I’m on Professional,  and know how to search using the classical means.

About search in a note (desktop client): My PDFs are set to the one page view. EN automatically shows large PDFs (above maybe 20MB) as title only. Yes, it won’t show search hits in title only, so you need to switch these to one page or all pages.

But when the view shows the pdf, in note search works for me, scrolls to the pages with hits and highlights them. It will cycle through all files and pages with the arrow buttons when there are several hits.

Link to comment
  • 2 weeks later...
On 10/19/2023 at 9:09 PM, eric99 said:

Are you a free user? (Indexed)  PDF search isn’t supported on a free plan (in V10)

Free legacy was able to search in searchable PDFs, but this was more a bug they weren’t aware of: officially it was never supported.

No,  I am not a free user.

Link to comment

I was told not to use V10 and the deprecated version 6.x alongside.

I understand,  but I need to since the export to HTML from V10 is much less the quality I had with version 6.x.  So I have to keep the legacy version alongside in order to export to HTML,  which I need for sharing documentation.

 

  • Like 1
Link to comment
  • 3 months later...
  • Level 5
1 hour ago, Bill Hamilton said:

I have the same issue.  EN 10 will find an 8 digit serial number in both a 10 page invoice scanned to pdf, and also a 1 page email where I know it occurs in both.  HOWEVER it won't show me WHICH page on the invoice the serial number occurs which makes it useless.  Legacy did it FINE and quickly.  Looking for a workaround.  Will upgrading to Professional help?  Currently on Personal.  Sad about Legacy going away. :(

Hi, and welcome to the forums. Double-posting only confuses things. I've replied to your other thread.

  • Like 1
Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...