Jump to content
  • 4

How to go back to Legacy Version


NightStalker
Go to solution Solved by Shane D.,

Idea

Recommended Posts

  • 0
  • Level 5*
39 minutes ago, PinkElephant said:

But some people who take a legacy search to start with have a broken legacy database or search index. I am pretty sure the majority of people in this (long) thread don’t know by heart whether their legacy data is clean, or how to fix it if not. Then it is garbage in, garbage out.

One thing I have found is that ScanSnap OCR is better than EN OCR.  I just did a search on Desktop 6.25.1 and V10 (desktop, browser and IOS) and found 326 notes with 6.25.1 and 256 notes with V10, a difference of 70 notes.  Most of the missing notes have the search term in the PDF.  Though one note with the term was created using the EN clipper, weird as that is a text and image note.  The PDF notes are mostly scanned receipts.  I didn't go all scientific on this just checked the first five that were missing and took a look at the bottom of the list.

As I've mentioned in these forums before I saw this problem some years back and worked with support.  It was recognized by EN as an issue and had to do with how the font was being viewed by the EN OCR engine, I paraphrase here.   Still not fixed.  So net of it all if I were to fully move to V10 I would be missing 70 results in a fairly common for me single word search. 

Really nothing to do with V10 specifically, more an ongoing issue with the OCR engine EN used.   Unfortunately these are the kind of errors you will never catch unless you are really sure a note exists but is not appearing in your search results and you go at it a different way.  That's how I identified the problem back then. 

I actually thought this should have been an Achtung! for EN when the issue was unearthed.  It  just reeks of bad data management no matter the size of the effected user base.  Another reason I am sticking with 6.25.1 with its local database as search is simply more accurate.  🤷‍♂️

Link to comment
  • 0
  • Level 5

There are different strategies to OCR something, and they lead to different results.

There was a thread explaining a little how EN builds the OCR result. They build sort of a tree from OCR results. It is important to understand that it is only created for the search index, not to enable the extraction of text. In fact you can’t extract this sort of OCR, because it is not stored as a text readable for humans.

ScanSnap OCR on the other hand uses the leading Abby Fine Reader software. And it’s main goal is to overlay a picture with a text layer, not to build an index. This text can be extracted and used independently from the scanned picture document.

Both approaches have their merits, but they don’t deliver comparable results.

Once there is a text layer in a pdf, EN will use it and not try to OCR it again. To really run a test it would be necessary to scan a number of documents twice, one with and one without being OCRed before import. Then import all into EN, wait a little for the server to work on them, and test the search results against each other.

Link to comment
  • 0
  • Level 5*

Point of it is that all the missing notes were OCR;d by SS prior to synching to EN.  The notes span five years.  So EN overlaid those SS OCR results or they should appear in the search set per the desktop..

Whatever the strategy search with 6.25.1 is more accurate than V10 or pre V10 IOS and Web.  EN OCR does not effectively work with all PDFs.  No emotional baggage for me,.  It is what it is.  One needs to be able to rely on the completeness of search, PDF and image OCR included.  Missing 70 notes isn't a one off.

Link to comment
  • 0
  • Level 5

This probably would deserve further analysis. Tech support was quite interested in solving my search differences - they were less in the OCR, more in the sharing status of the notes, and in the age of the notes.

Link to comment
  • 0
  • Level 5*
3 minutes ago, PinkElephant said:

This probably would deserve further analysis. Tech support was quite interested in solving my search differences - they were less in the OCR, more in the sharing status of the notes, and in the age of the notes.

Got the t shirt on this one, to no avail.  If they are interested and reading the forums they know where I am.  :)

Link to comment
  • 0
  • Level 5*
17 minutes ago, PinkElephant said:

Uhhh - „they know where I am“ always sounds like a SWAT team at 3 in the morning 😱

Pretty sure the EN SWAT team won't be showing up at my door!  I'll turn on my VPN just to be safe.

  • Haha 1
Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...