Jump to content

How to disable OCR?


krisf

Recommended Posts

Hello,

since search results are poisened by false positives caused by crappy OCR, I want to disable OCR in order to get more accurate searches.

Is there a way to do this?

Thanks

 

 

Link to comment
  • Level 5

Which OCR you regard as crappy: The one created before a file (pdf ?) was loaded into EN, or the OCR done by EN on its servers ?

This are two completely separate ways of OCRing, and if the problems arise in one process, it makes no sense to fix the other.

Up to my knowledge, if a document was OCRed before it was loaded into EN, it will not be OCRed again on the servers. So if you do your own OCR before you create the note in EN, maybe switch this OCR off and give EN a try to OCR these files on the server.

You can create 2 PDFs from the same stack of paper, one OCRed locally and one by EN, do this for 10-20 documents you typically scan, and check what works better by trying it out. From my experience, if the scan is „clean“ regarding quality and language set, the server based OCR does a very good job. You need a pretty good local OCR Platform to match it.

Link to comment

The thing is, OCR will never be 100% free of errors. So I just don't want to include it in my index. But Apparently this is not possible in EverNote, yet. 

Oh well

 

Link to comment
  • Level 5*
On 4/11/2019 at 3:52 PM, krisf said:

One cannot put all relevant key words in the title.

I operate on the basis of 'smart' titles and searches.  The titles include the date, type, source, and (some) keywords,  and I refine search results by editing titles,  adding tags and saving 'exact' searches (that include the results I need,  and only the results I need) where necessary.  I've worked in various industries that use BIG databases,  and the abiding lesson is: there's no 'perfect' index.  No matter how carefully you manage the content,  a combination of entry errors and omissions and the mulish variety of content mean that you'll always have too many,  or too few results.  Using a database means being familiar with the search grammar so you can ask very specific questions,  and tweaking the content frequently so you get good answers.

I do have searches that start out general,  and then exclude keywords, iteration by iteration,  until I get to an acceptable level of accuracy - but then I add a new tag to those notes that will find them for the future.  (My tag list lives in Workflowy as well as Evernote so I can view and review it in a more user-friendly format than Evernote currently allows.)

My data may be unusually specific,  but I do have getting on for 46k notes,  and finding all my stuff is still relatively easy....

Link to comment

'smart titles' imply that you know in advance what search words you are gonna use. 

When I'm composing a note, I don't want to think about how I want that note to be findable. That's something the search engine should take care of for me. You need to use 'smart titles' cause the EverNote search engine is not smart. 

Link to comment
  • Level 5*
42 minutes ago, krisf said:

'smart titles' imply that you know in advance what search words you are gonna use. 

I don't know what sort of content you're saving,  but mine is emails / correspondence / project stuff / photo locations / tech tips / receipts - pretty varied.  I search for things like receipts,  and project stuff - so 'everything within <these dates> that has "receipt" in the title' will get me a broad cross section;  that, plus 'Macdonalds in the title' gets me my junk food exposure.  'Everything tagged <projectnumber>' gets me a list of content I can pick from,  or select from more closely.

YMMV,  and I'm just sayin' - but Evernote's search works well for me.

- The searches above obviously look rather different in Evernote grammar;  I'm just rendering the intent of each one.

Link to comment
  • Level 5*
On 4/11/2019 at 1:58 AM, krisf said:

Hello,

since search results are poisened by false positives caused by crappy OCR, I want to disable OCR in order to get more accurate searches.

Is there a way to do this?

Thanks

Per the above, not specifically. 

Workaround, if there is a particularly problematic document type, you can minus that type out of your search results.  For example -resource:image/png will eliminate any notes with png images from the search results (use jpg for jpegs).  These two document types if any are the ones that will fail OCR in my use case.  You can use a text expander to hot key it into your search.  Of course this will eliminate any notes with the search term in the text of the note that contain an image with the term. 

Link to comment
13 hours ago, CalS said:

Per the above, not specifically. 

Workaround, if there is a particularly problematic document type, you can minus that type out of your search results.  For example -resource:image/png will eliminate any notes with png images from the search results (use jpg for jpegs).  These two document types if any are the ones that will fail OCR in my use case.  You can use a text expander to hot key it into your search.  Of course this will eliminate any notes with the search term in the text of the note that contain an image with the term. 

The  -resource:image/* is indeed a usable workaround. Thanks for all your replies, everyone. 

Cheers

 

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...