Jump to content

Excluding or correcting recognized data


Recommended Posts

Posted

I'm seeking a way to exclude recognized data while including textual data from the same when searching and have yet to find a good way to do so.

 

Why would I want to do this?

 

Web clips and notes with attached photos or scans of handwritten notes generate a lot of false positive results when searching.

 

One can include "-RecoType:*" in the search box to eliminate all notes with any recognized data but that doesn't allow one ot search the not recognized, plain text note content.

 

It'd also be nice if there was a way to correct recognizer hits when they are wrong. For example, when one searches for the word "Surplus" and the result set shows notes that contain handwritten notes highlighting what Evernote thinks is the word "Surplus" but is in fact some completely unrelated word, it'd be nice if one could click on the incorrectly recognized word and mark it as not "Surplus" or even better yet, provide the correct text that should be recognized there. This would allow for elimination of false positives on subsequent searches for the same word. 

 

Recognizing handwritten text and embedded text in photos is really cool, but the false positives are often hysterical and also frustrating, worse than Siri and autocorrect.

 

Am I the only one who has this problem or merely the only one who hasn't figured out how to solve it?

 

Any help would be appreciated.

 

Thanks.

Scott Turner

  • Level 5*
Posted

No easy way that I know of to do this.

 

The hard way is to use a special tag for notes that have recognition text (or that do not have it), and use that in your searches (and maintain it as you add new notes). In other words, do a filter on notes that have recognition text ( "recotype:*" ), and tag witha specian tag, say "HasText". Then you can exclude these notes using a "-tag:HasText" term. Or vice-versa (you might need the other term if you're using an "any:" search.

 

This is ugly, I understand...

Posted

Ugly indeed! One would have to create a shadow or index note for every term one might ever want to search on and link the shadow note back to the original note. Wouldn't it be so much nicer if one could just indicate that specific search terms should not look in the recognized data lists?

 

As an example of how spurious the false positives can be, by examining the local data store I found that the small attached .png preview file was treated as handwritten data containing the following words. All or almost all of which are bogus. And while it's unlikely that I would search for PILGER, there are a fair number of words that one might reasonably search for. For example, RUM, RPM, LAYLA, RISC, INTERN, BLAME

 

'Mill 'Maa ADMN ADMIT BDNA RPM RUM ADAM APART INTENT ENEMY INCUR INTRON INTERN INTERNS SNOUT INCAS SNOWY metamer INCH metoserpate an on Qty Qfy QIW SAKE SATED SAM Saad said sand salt sect Saw SWIZ BLAME SWW ALAMO BIBLE PRATE ATT act AGE mama ABSORBS ABASES aroma mamm APROPOS PARSS ABASE ABODES ABASED ABBESS ABSORB Marryat gramm drama Marcia Morra prima prams Praia Maria Marta Marla Ramm ABDO URATE WARBLE rise WARDLE risk RISC bo b0 ua uis uts uu OS GBP oep IF PILGER 'si! si! 'in OF LAYLA LAL TAL ALA LA.

 

post-136515-0-63050300-1423518399_thumb.

  • Level 5*
Posted

I'm seeking a way to exclude recognized data while including textual data from the same when searching and have yet to find a good way to do so.

 

Why would I want to do this?

 

Web clips and notes with attached photos or scans of handwritten notes generate a lot of false positive results when searching.

. . .

Recognizing handwritten text and embedded text in photos is really cool, but the false positives are often hysterical and also frustrating, worse than Siri and autocorrect.

 

Am I the only one who has this problem or merely the only one who hasn't figured out how to solve it?

 

Any help would be appreciated.

 

Thanks.

Scott Turner

 

Sorry, I don't know of any way to exclude or change the recognized text for a specific image.  I don't think it can be done short of changing or removing the image.

 

This is one of the many reasons that some of us make good use of tags.

The more images with text and PDFs that you put into Evernote the more likely you are to get false positives when searching ONLY for text in the Search Box.

 

See The Benefit of Using Tags  for more info.

  • Level 5*
Posted

 

Ugly indeed! One would have to create a shadow or index note for every term one might ever want to search on and link the shadow note back to the original note. Wouldn't it be so much nicer if one could just indicate that specific search terms should not look in the recognized data lists?

 

Uh, no; just a single tag (maybe two) that you use to exclude all notes that contain any OCR data -- "recotype" notes -- from your search space. I explained it all above. My workaround is ugly, sure, but a darn sight less ugly than the one you thought I was describing.

 

That OCR isn't a perfected technology is already known...

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...