Jump to content

Any Way to Index Images?


Recommended Posts

Does EverNote index text within images, such as JPG and PNG format?   It would need to OCR the image, obviously, in order to find the text.

I upload a lot of technical diagrams that have been printed to image formats, but which contain textual labels throughout the diagram.   Having that automatically full text indexed would be a gigantic feature for me.    That would provide an easily-explained selling point for EverNote as well.   Anyone who collects technical diagrams on a subject would now have a way to textually search through any term appearing in the diagram.

If EverNote does not do this, is there any tool I can buy that would automatically scan images in a folder and create a companion document that has the OCR text from that image?

The things I do not like about EverNote OCR so far:

1) I hate the lack of tools to easily see whether a document has been indexed at all.

2) For me anyway, PDF files are frequently just not full text indexing at all.  I have opened a trouble ticket and so far no resolution after a week.

3) I want a way to actually inspect and modify the full text index of a PDF, to correct errors and enrich the OCR.

 

Link to comment
  • Level 5*
20 minutes ago, persistentone said:

Does EverNote index text within images, such as JPG and PNG format?   It would need to OCR the image, obviously, in order to find the text.

Evernote has an ocr indexing process for images

For more details see https://blog.evernote.com/tech/2013/07/18/how-evernotes-image-recognition-works/

You can export the note and view the indexing - no edit option

Link to comment
1 hour ago, DTLow said:

Evernote has an ocr indexing process for images

For more details see https://blog.evernote.com/tech/2013/07/18/how-evernotes-image-recognition-works/

You can export the note and view the indexing - no edit option

 

So the problem I am having with images is that EverNote indexes some terms, but not others, and I have no way to inspect the index and make corrections.

For PDF files, my account appears to be in a bad state and they just never index many parts of PDFs, and they never index text annotations I add to PDFs.  Again, I have no way to inspect and correct any indexing.

Since the very heart of this product is its full text index, these are serious problems for me.

Link to comment
  • Level 5*
11 hours ago, persistentone said:

So the problem I am having with images is that EverNote indexes some terms, but not others, and I have no way to inspect the index and make corrections.

There's no way to examine or correct OCR'd text inside of Evernote, but you can examine the recognized text if you export your notes to Evernote' ENML format (.enml). Documented here: https://dev.evernote.com/doc/articles/enml.php, and the recognition stuff here: https://dev.evernote.com/doc/articles/image_recognition.php

Link to comment
  • Level 5*
11 hours ago, persistentone said:

For PDF files, my account appears to be in a bad state and they just never index many parts of PDFs, and they never index text annotations I add to PDFs.

This shouldn't be anything to do with your account "bad state"
Can you post the pdf so we can test it ourselves 

I'm currently relying on the Evernote OCR indexing but its not critical to my process
I could OCR my pdfs externally, or include text in the note
 

Link to comment
10 hours ago, jefito said:

There's no way to examine or correct OCR'd text inside of Evernote, but you can examine the recognized text if you export your notes to Evernote' ENML format (.enml). Documented here: https://dev.evernote.com/doc/articles/enml.php, and the recognition stuff here: https://dev.evernote.com/doc/articles/image_recognition.php

 
 

Probably the last thing I wanted to do was start hacking XML formats, but maybe with this product that is the only way to get this understood.

Is there any way to change the XML attributes outside of EverNote, then import the changed Note?

Here is a typical problem I want to fix.   I created a new Note and inserted one technical diagram into the Note.   That image is loaded with text that could be OCRd and searched.   When I load that image by itself into a Note, EverNote OCRs and indexes it.   But if I incorporate that image into a larger Note, EverNote does nothing with the image and there is no OCR.    Is that by design?    Is full text indexing on EverNote's servers done by a lottery ticket method, and my Notes just weren't lucky enough to be in this week's lottery?    

I mean how difficult would it be to just let us right-click and image and select an attribute "OCR and Full-Text Index Image".    At least give the user a chance to force this file to reindex?      And how difficult would it be to have a second option to get statistics on the image's full text index?   The statistics might indicate the number of words located.

Link to comment
9 hours ago, DTLow said:

This shouldn't be anything to do with your account "bad state"
Can you post the pdf so we can test it ourselves 

I'm currently relying on the Evernote OCR indexing but its not critical to my process
I could OCR my pdfs externally, or include text in the note
 

 
 

So how do you explain the following sequence:

1) My consultant creates an empty Notebook.

2) I insert one PDF in a Note and I add one annotation to that Note that says "testkeyforindex2357"

3) He searches the Notebook based on a substring of that unique word and he finds the Note.  He searches within the Note and he finds the text that way too,

4) I search the Notebook on either the substring testkey or the full search term, and I do NOT locate the Note.  I search inside the Note on both of these things and I still do not find the Note.

So the product works different for him than it does for me, against the same exact Notebook.  I tried creating my own private Notebook and creating a Note and annotating, and it still does not work.

I understand that products have bugs, and I tolerate that.  What makes this situation so unbearable is that the heart of EverNote is full text indexing, and they literally give you no high-level tools to affect full text indexing or explore the full text index!!!!

Regarding OCR outside EverNote: yes, I am handling EverNote's OCR bugs by doing my own OCR outside of EverNote.  That works for PDF files.  But it does NOT work when I want to insert an image into a Note.  I am trying to build Notes that read like small professional papers, and I want the people I colloborate with to build these living documents with me.  Turning this all into read-only PDF totally undoes the value of having an editable text environment inside of EverNote

Link to comment
6 hours ago, DTLow said:

No idea, but if you post the PDF I'll check further

Also, as a Premium Account, you could open a support ticket at Contact Evernote Support

 
 

I have opened a support ticket right from the day about 10 days ago you suggested I do that.    So far no resolution.   The people who interface to the end user in their support group look like very low level techs.  I couldn't even get them to understand the problem well, and they just ask you to create an export file and activity log and they don't appear to be able to converse about the problem.   I'm hoping someone from a "level 2" eventually contacts me.

How about we do the experiment in reverse?   You create a Notebook.  Insert any PDF you like in that notebook.  Insert any annotation text you like in that PDF.   Link me to that Notebook and let's see if I am able to search it?  Of course you confirm that you can search too.

If we do the experiment the way you suggested, I already explained the other side appears to be able to search the Note.  It's my account that is not able to do the search.  It doesn't appear to be a problem with the index, but how exactly would I be able to prove that.

Link to comment
  • Level 5*
15 minutes ago, persistentone said:

How about we do the experiment in reverse?   You create a Notebook.  Insert any PDF you like in that notebook.  Insert any annotation text you like in that PDF.   Link me to that Notebook and let's see if I am able to search it?

If you PM me your email address, I'll share the notebook with you

In the meantime, here is the pdf and search results
Move the pdf to your Evernote account and see how the index/search works for you

Test PDF with annotation.pdf

58c4a72e01ebf_ScreenShot2017-03-11at5_36_59PM.thumb.png.d88c316140c92bb12eb3fe4c8810e43e.png

Link to comment
6 hours ago, DTLow said:

If you PM me your email address, I'll share the notebook with you

In the meantime, here is the pdf and search results
Move the pdf to your Evernote account and see how the index/search works for you

Test PDF with annotation.pdf

58c4a72e01ebf_ScreenShot2017-03-11at5_36_59PM.thumb.png.d88c316140c92bb12eb3fe4c8810e43e.png

 

When I add your PDF I can search and find the annotation.   What I am confused about is that your PDF has the annotation inside of it even as a detached file.   I created my annotation from inside EverNote.    I thought EverNote kept those annotations inside EverNote and did not actually keep them with the PDF when you exported?   Apparently that is wrong.    

Link to comment
  • Level 5*
4 hours ago, persistentone said:

What I am confused about is that your PDF has the annotation inside of it even as a detached file.

Annotations are stored inside the PDF; something about layers.  It's part of the PDF format specs - not an EN thing

I'm displaying the annotations as a summary - that's optional

Link to comment
  • Level 5*
On 3/11/2017 at 8:06 PM, persistentone said:

Is there any way to change the XML attributes outside of EverNote, then import the changed Note?

Of course. Evernote understands its own export format, at least the desktop versions do. You can import .enex files. You should try it, and see what you get.

Link to comment
  • Level 5*
On 2017-03-11 at 5:06 PM, persistentone said:

I created a new Note and inserted one technical diagram into the Note.   That image is loaded with text that could be OCRd and searched.   When I load that image by itself into a Note, EverNote OCRs and indexes it.   But if I incorporate that image into a larger Note, EverNote does nothing with the image and there is no OCR.

Seems like a good case to submit a support ticket Contact Evernote Support

I know you're unhappy with the Support response, but give them a chance to follow up.  Include enex export of the two notes.

If you post the exports here, we can look into it further

Link to comment
36 minutes ago, jefito said:

Of course. Evernote understands its own export format, at least the desktop versions do. You can import .enex files. You should try it, and see what you get.

So the specific task I would want to perform is to flag an image used inside a Note as a searchable image, so that EverNote will index it.   Is the documentation for their XML format sufficient that I can hack the XML file to do this change?   Would it work even if I did it?

Link to comment
  • Level 5*
22 minutes ago, persistentone said:

So the specific task I would want to perform is to flag an image used inside a Note as a searchable image, so that EverNote will index it.   Is the documentation for their XML format sufficient that I can hack the XML file to do this change?   Would it work even if I did it?

There is no flag to set; all images get OCR'd.  It might take some time, but it was very quick for me.

You might try rebuilding your search index

Link to comment
  • Level 5*
7 hours ago, persistentone said:

So the specific task I would want to perform is to flag an image used inside a Note as a searchable image, so that EverNote will index it.

Evernote OCRs all images in your notes automatically. You don't need to tell them to do so.

7 hours ago, persistentone said:

Is the documentation for their XML format sufficient that I can hack the XML file to do this change?   Would it work even if I did it?

Per DTLow, there is no such flag. I thought that you wanted to hack the existing recognition data.

Link to comment
14 hours ago, jefito said:

Evernote OCRs all images in your notes automatically. You don't need to tell them to do so.

Per DTLow, there is no such flag. I thought that you wanted to hack the existing recognition data.

 

I would understand this approach if it actually worked,  but at this point I have encountered so many indexing bugs in this product - in images, in PDFs, in Notes that include images - that I wished they had done it a different way.    I can put up with lots of bugs if there are workarounds for those bugs, but a lot of this stuff has only one workaround:  rebuild your entire full-text index.   And if that doesn't fix it, then contact support.

Link to comment
  • Level 5*

PDF OCR is different than image OCR. If you have specific ideas about how they can do it better, then that's what the Feature Requests forum is for. I don't know of the other indexing bugs you are talking about; it's far better if you are specific about issues you find.

Link to comment
21 minutes ago, jefito said:

PDF OCR is different than image OCR. If you have specific ideas about how they can do it better, then that's what the Feature Requests forum is for. I don't know of the other indexing bugs you are talking about; it's far better if you are specific about issues you find.

 
 
 
 

For example:

* Many images that I put into Notes only have some of the words indexed.   I have no logging or information about why the other words fail to index.

* In some Notes I insert images, and the image is never marked as Searchable and never indexed at all.   Again, I have no logging or information about why.

* I add some PDFs to Notes and they never get indexed.  So I am now resorting to using third party OCR tools because I don't trust EverNote to do it.

* I add some text annotations into PDF files, and sometimes that works and sometimes that does not work.

I have tried rebuilding indexes.  I have tried fixing all Notes.  I have submitted at least one problem ticket and I am waiting to see how seriously they invest in helping me with that before I waste time filing more.  I have posted about most of these things in other threads here.

In any case, I like the vision for the product and what it tries to do.   It just doesn't look like the OCR part of their technology is mature enough to rely on, at least not for my use.

Link to comment
  • Level 5*
28 minutes ago, persistentone said:

Many images that I put into Notes only have some of the words indexed.   I have no logging or information about why the other words fail to index.

Images can be hit and miss; you can see the ocr results by exporting the note in .enex format58c75e43be4c6_ScreenShot2017-03-13at8_05_59PM.png.06a594bacca8ef4920fd96960894de1d.png
You'll see the ocr results as per this example

Further documentation here

https://blog.evernote.com/tech/2013/07/18/how-evernotes-image-recognition-works/

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...