C.Noize 2 Posted June 8, 2010 Share Posted June 8, 2010 Hi! I recently noticed, that with an increasing number of images in my notes i get more and more false hits on text searches. Even though the OCR is very very good it's not perfect and often it associates wrong words with a text in an image. It would be nice if i could manually declare a recognition as false by right clicking the yellow marked text and clicking something like "selection does not equates search term". The only thing for you to do would be to delete the word out of the image OCR database and resync it. (I hope that's as easy as it sounds. ) Link to comment
Level 5 jbenson2 2,146 Posted June 9, 2010 Level 5 Share Posted June 9, 2010 Yes, that would be nice. I've seen this happen frequently with screen grabs of maps (with lots of small faint city names) for my long road trips. What I do as a work around is to assign a tag X to each image that has a false hit. When I want more accurate results, I run a search which includes -tag:X Link to comment
ruudhein 28 Posted June 9, 2010 Share Posted June 9, 2010 I was thinking about that before my reply but am unsure how that would work with multiple words.If the image says "click here" and it comes up for the search "home", you'd add the X tag. But for a refined search that would exclude the image also for the (correct) 'click'Maybe adding the misidentified word preceded by x could also help then?xhereRefined search:here -xhere Link to comment
KTK_NJ 0 Posted February 16, 2011 Share Posted February 16, 2011 I just found this thread, and I agree that the ability to manually correct OCR would be very helpful. I recently searched for a word (can't remember which word) and wound up with a photograph of a knitting pattern in my results - somehow the OCR interpreted the knitting stitches as letters. Link to comment
Level 5* jefito 5,586 Posted February 16, 2011 Level 5* Share Posted February 16, 2011 For images, the OCR data can be seen, and edited, by exporting to .enex format, and looking for the section. Each recognized (erroneously or not) piece of text is an element, with pixel 'x', 'y', 'w' (width) and 'h' (height) coordinate attributes; nested inside at the candiate words, represented by elements. You could edit an exported .enex file and remove bad OCR guesses, or even, for extra credit, add your own items, then import it back in.This would be a workaround with the emphasis on 'work'. Link to comment
anghammarad 0 Posted October 16, 2011 Share Posted October 16, 2011 …This would be a workaround with the emphasis on 'work'. Well put Jeff - and I'm impressed at the observation! But yes, it's a prohibitive process. I for one look forward to having it in the GUI! …Although I might just get in there once and fix something manually: you, machine, are pretty creative reading "toner" into that tablecloth pattern… I'm seeing this as the sorta 'top result' because it's most recent in this sort 'by date created' view. At least this annoyance brought me here tonight, aye? Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.