Jump to content

(Archived) Feature Request: Manual correction of OCR image data

Recommended Posts


I recently noticed, that with an increasing number of images in my notes i get more and more false hits on text searches. Even though the OCR is very very good it's not perfect and often it associates wrong words with a text in an image. It would be nice if i could manually declare a recognition as false by right clicking the yellow marked text and clicking something like "selection does not equates search term".

The only thing for you to do would be to delete the word out of the image OCR database and resync it. (I hope that's as easy as it sounds. :))

Link to comment
  • Level 5

Yes, that would be nice.

I've seen this happen frequently with screen grabs of maps (with lots of small faint city names) for my long road trips.

What I do as a work around is to assign a tag X to each image that has a false hit.

When I want more accurate results, I run a search which includes -tag:X

Link to comment

I was thinking about that before my reply but am unsure how that would work with multiple words.

If the image says "click here" and it comes up for the search "home", you'd add the X tag. But for a refined search that would exclude the image also for the (correct) 'click'

Maybe adding the misidentified word preceded by x could also help then?


Refined search:

here -xhere

Link to comment
  • 8 months later...

I just found this thread, and I agree that the ability to manually correct OCR would be very helpful. I recently searched for a word (can't remember which word) and wound up with a photograph of a knitting pattern in my results - somehow the OCR interpreted the knitting stitches as letters. :lol:

Link to comment
  • Level 5*

For images, the OCR data can be seen, and edited, by exporting to .enex format, and looking for the section. Each recognized (erroneously or not) piece of text is an element, with pixel 'x', 'y', 'w' (width) and 'h' (height) coordinate attributes; nested inside at the candiate words, represented by elements. You could edit an exported .enex file and remove bad OCR guesses, or even, for extra credit, add your own items, then import it back in.

This would be a workaround with the emphasis on 'work'.

Link to comment
  • 7 months later...
…This would be a workaround with the emphasis on 'work'.

Well put Jeff - and I'm impressed at the observation! But yes, it's a prohibitive process. I for one look forward to having it in the GUI! :)

…Although I might just get in there once and fix something manually: you, machine, are pretty creative reading "toner" into that tablecloth pattern…

I'm seeing this as the sorta 'top result' because it's most recent in this sort 'by date created' view. At least this annoyance brought me here tonight, aye?

Link to comment


This topic is now archived and is closed to further replies.

  • Create New...