Jump to content
  • 0

Feature Request: How to tell if PDF has been indexed?


jfkaess

Idea

32 replies to this idea

Recommended Posts

  • Level 5*
10 hours ago, BSR said:

When Evernote processes the notes in this fashion, it will also highlight individual works in the When you search in the PDF.

In Windows, when I let EN OCR a document and then I search for a string of words in that document, I find that the words are not highlighted in the pdf view.  If I let ScanSnap do the OCR, a search for that same string of words will show as highlighted in the pdf view.  I can quickly use the arrows to page through the document to see what I need without having to double click to load Acrobat and then do a ctrl-f there to find what I'm searching for.

Link to comment
  • Level 5*
4 hours ago, BSR said:

When Evernote processes the notes in this fashion, it will also highlight individual works in the When you search in the PDF.

Search text will also be highlighted in PDFs OCRd by ScanSnap or downloaded from a provider site, assuming works is words in the above.

4 hours ago, BSR said:

If your process creates a lot of scanned PDFs, I would set up the scanner to not OCR while scanning. Then Evernote will do the scanning when the note is created.

I'm with @jbenson2 on this one, my method as well is to let ScanSnap do the scanning OCRing.  For all the reasons he gives.  In my mission to become paperless I have accumulated 16k PDFs in EN.  Thus far I have always been able to search and find what I need.  

One thing to watch out for are any downloaded PDFs that contain rendered text.  I have found for those to be searchable in their entirety you first have to print to PDF, OCR, and replace the PDF in the note.  A bit of a PITA for a few web sites, but it is what it is I'm afraid.  FWIW.

Link to comment
  • Level 5*
59 minutes ago, jfkaess said:

Evernote really needs to address these concerns pointed out in this thread. I've been premium for more than 6 years, but my main reason has always been the ability to search pdf's and it seems that given the rules for Evernote to undex along with the unability to actually know if a particular document has actually been indexed or did not meet a rule, and now the demnstration that my local OCR is better than Everntes indexing all means that i may well be wasting my money on premium instead of plus.

 

41 minutes ago, csihilling said:

One thing to watch out for are any downloaded PDFs that contain rendered text.  I have found for those to be searchable in their entirety you first have to print to PDF, OCR, and replace the PDF in the note.  A bit of a PITA for a few web sites, but it is what it is I'm afraid.  FWIW.

The diy OCR works, providing
- your scanner has an OCR function (mine doesn't)
- you don't use other scanning methods (I use Scanable with my iPhone/ipad)
- downloaded PDFs as noted above by @csihillining

For these reasons, I'm sticking with the builtin EN feature
I realize in a post EN world I face a massive batch OCR process but I'll address it then

Link to comment
1 hour ago, BSR said:

PDF's can be tricky.

 

--snip--

 

All of this can create confusion when accessing, searching contents of PDF's.

And  ^^this^^  is exactly the problem. And it is not a new problem. It has always, since the beginning, and continues to now be, the problem. And apparently Evernote is good with it just the way it is. I like Evernote. I'm a fan. I have well over 12,000 notes in Evernote and have been premium for more than 6 years.

This is something which Evernote needs to address (along with the horrendous issues with what is supposedly the editor and is so far behind state of the art that Evernote should be publicly ashamed).

Link to comment
  • Level 5*
2 hours ago, DTLow said:

- downloaded PDFs as noted above by @csihillining

I would check this out relative to renderable text.  I'm not sure EN OCR addresses this either.  

Link to comment
5 hours ago, jbenson2 said:

I realize BSR is an Evernote employee and has insights into the program we do not have. As mentioned already, and confirmed by BSR, there is a lot of confusion confirming whether Evernote has done the OCR. In several other forum posts, the suggestion for non-Premium users was to wait a few hours or days and then manually check using the search feature. With a Scanner doing the OCR, it takes a few extra seconds, but at least I am sure it is done almost instantly.

Frankly, I take exactly the opposite position due to several reasons. I have 8,050 PDF notes in Evernote.
It takes a bit longer to scan, but I always let ScanSnap do the OCR for me. I do not see any benefit in using Evernote to process my PDF's.

Why?

1.) Exported PDFs:
ScanSnap: The PDF document remains OCR'd if I export it from Evernote.
Evernote: The PDF document loses its OCR if I export it from Evernote. 
 
2.) Consistency:
ScanSnap: The search results are consistent in Evernote, whether I view them from my desktop client or the Evernote web.
Evernote: The search results are not consistent because Evernote uses different OCR software depending on the platform. 
 
3.) 100% OCR:
ScanSnap: Works on notes that are stored in my local non-sync'd Evernote notebooks.
Evernote: Evernote cannot see my notes on my local non-sync'd notebooks, so the PDF's cannot be OCR'd. 
 
4.) No complicated / confusing rules:
ScanSnap: OCR's all my PDF's - no rules and I know it is done.
Evernote: Evernote has 7 technical rules to follow and no warning if the document fails to meet any of the rules

 

This is a good point, the ScanSnap scanner is designed and optimized to easily integrate with Evernote and in that case the recommendation to turn off the character recognition on the scanner side would not be as applicable. Everyone has their own workflow and can scan and add their PDFs to Evernote in the manner that works best for them, this goes without saying. If you are scanning PDFs using a non ScanSnap third party scanner and want individual search terms highlighted in the search results, my recommendation would still be to let Evernote perform the character recognition. 

Thanks for bringing this up!

Link to comment
  • Level 5
1 hour ago, BSR said:

Thanks for bringing this up!

You are welcome. I appreciate your input.

To clarify my side of the discussion - I purchased my Fujitsu S300 miniature scanner 7 years ago in 2009 for $239. It is not the current top-of-the-line iX500 model dedicated only for Evernote use. 

But I agree with jfkaess. The topic (how to easily verify if a note has been OCR'd by Evernote) has been kicking around for years.
 

Link to comment
29 minutes ago, jbenson2 said:

But I agree with jfkaess. The topic (how to easily verify if a note has been OCR'd by Evernote) has been kicking around for years.
 

Fair enough! I went ahead and moved this to the Product Feedback thread as a feature request. We'll treat this as the primary thread for this topic from here on out.

Link to comment
  • Level 5
29 minutes ago, BSR said:

If your process creates a lot of scanned PDFs, I would set up the scanner to not OCR while scanning. Then Evernote will do the scanning when the note is created.

 
 
 
 
 
 
 
 

I realize BSR is an Evernote employee and has insights into the program we do not have. As mentioned already, and confirmed by BSR, there is a lot of confusion confirming whether Evernote has done the OCR. In several other forum posts, the suggestion for non-Premium users was to wait a few hours or days and then manually check using the search feature. With a Scanner doing the OCR, it takes a few extra seconds, but at least I am sure it is done almost instantly.

Frankly, I take exactly the opposite position due to several reasons. I have 8,050 PDF notes in Evernote.
It takes a bit longer to scan, but I always let ScanSnap do the OCR for me. I do not see any benefit in using Evernote to process my PDF's.

Why?

1.) Exported PDFs:
ScanSnap: The PDF document remains OCR'd if I export it from Evernote.
Evernote: The PDF document loses its OCR if I export it from Evernote. 
 
2.) Consistency:
ScanSnap: The search results are consistent in Evernote, whether I view them from my desktop client or the Evernote web.
Evernote: The search results are not consistent because Evernote uses different OCR software depending on the platform. 
 
3.) 100% OCR:
ScanSnap: Works on notes that are stored in my local non-sync'd Evernote notebooks.
Evernote: Evernote cannot see my notes on my local non-sync'd notebooks, so the PDF's cannot be OCR'd. 
 
4.) No complicated / confusing rules:
ScanSnap: OCR's all my PDF's - no rules and I know it is done.
Evernote: Evernote has 7 technical rules to follow and no warning if the document fails to meet any of the rules

 

Link to comment

S2sailor,

that's a fault in the Windows operating ststem and not Evernote's problem. PDF File viewing is built into OS X so that on a Mac, the words are highlighted when searched for directly inside Evernote without the need of an external pdf file viewer.. Microsoft has made a deliberate decision to not natively supoort pdf viewing in the operating system which is why you have to install adobe acrobat reader or another pdf reader in order to access the search feature within a pdf file. On a Mac, it just works.

Link to comment
  • Level 5*
31 minutes ago, jfkaess said:

S2sailor,

that's a fault in the Windows operating ststem and not Evernote's problem. PDF File viewing is built into OS X so that on a Mac, the words are highlighted when searched for directly inside Evernote without the need of an external pdf file viewer.. Microsoft has made a deliberate decision to not natively supoort pdf viewing in the operating system which is why you have to install adobe acrobat reader or another pdf reader in order to access the search feature within a pdf file. On a Mac, it just works.

Yes, but I think you may have missed the point of my post. In Evernote for Windows the highlighting is different depending on whether the Evernote service does the OCR or Scansnap. I get highlighting in the PDF viewer only if I let Scansnap do the OCR.

Link to comment
  • Level 5*
11 hours ago, s2sailor said:

Yes, but I think you may have missed the point of my post. In Evernote for Windows the highlighting is different depending on whether the Evernote service does the OCR or Scansnap. I get highlighting in the PDF viewer only if I let Scansnap do the OCR.

Out of interest, how do you get the arrow keys to work in the PDF in the note view, no Acrobat?  Something new to me.

Link to comment
  • Level 5*
5 minutes ago, csihilling said:

Out of interest, how do you get the arrow keys to work in the PDF in the note view, no Acrobat?  Something new to me.

Sorry for the confusion, I am referring to these in the viewer:

arrows.jpg

Link to comment
  • Level 5*
26 minutes ago, s2sailor said:

Sorry for the confusion, I am referring to these in the viewer:

arrows.jpg

Thanks for the reply.  As I think you surmised, I was hoping for more, like using the keyboard arrow keys to go from instance to instance within the PDF.  I thought there was some magic in the universe I was missing.  Oh well....   :(

Link to comment
  • Level 5*
1 hour ago, jasecutler said:

Question: I've uploaded a small (8mb) ebook in PDF form onto EN (premium). However, the EN search function cannot locate any words in the document. (Image: https://ibb.co/d3OACd)

Note: This is a paid account feature.  Others have reported success if the pdf is previously ocr'd610123196_ScreenShot2018-05-28at22_08_33.png.ecaa6d395dd2e65857556764370ec954.png 

On my Mac, I actually get a file as shown in the screenshot.

 

Link to comment
1 hour ago, jbenson2 said:

there is a lot of confusion confirming whether Evernote has done the OCR.

Frankly, I take exactly the opposite position due to several reasons. I have 8,050 PDF notes in Evernote.

It takes a bit longer to scan, but I always let ScanSnap do the OCR for me.

1.) Exported PDFs:
ScanSnap: The PDF document remains OCR'd if I export it from Evernote.
Evernote: The PDF document loses its OCR if I export it from Evernote. 
 
2.) Consistency:
ScanSnap: The search results are consistent in Evernote, whether I view them from my desktop client or the Evernote web.
Evernote: The search results are not consistent because Evernote uses different OCR software depending on the platform. 
 
3.) 100% OCR:
ScanSnap: Works on notes that are stored in my local non-sync'd Evernote notebooks.
Evernote: Evernote cannot see my notes on my local non-sync'd notebooks, so the PDF's cannot be OCR'd. 
 
4.) No complicated / confusing rules:
ScanSnap: OCR's all my PDF's - no rules and I know it is done.
Evernote: Evernote has 7 technical rules to follow and no warning if the document fails to meet any of the rules

 

This is exactly what i have been trying to get at.

To resolve this, i scanned several documents into Evernote twice. Once with and once without the scanner software doing OCR. My scanner is an older Canon P-150 (Scantini). I then put them all in the same notebook, waited overnight to make sure Evernote had time to index (i am a premium user) and then did some searches for words i knew were in those documents. I found that:

1) Evernote was able to search and find words in pdf's that the scanner had OCR'd and also those which Evernote had indexed.

2) The documents which my scanner had OCR'd sometimes found more matches to the search than those indexed by Evernote

2) The Evernote indexed documents never had more hits than the same document OCR'd by my scanner.

My conclusion: in my specific circumstances, doing all this on my iMac, using my Canon scanner and its CaptureOnTouch software, the resulting OCR'd pdf's are more searchable than those pdf's indexed by Evernote.

Evernote really needs to address these concerns pointed out in this thread. I've been premium for more than 6 years, but my main reason has always been the ability to search pdf's and it seems that given the rules for Evernote to index, along with the inability to actually know if a particular document has actually been indexed or did not meet a rule, and now the demonstration that my local OCR is better than Evernote's indexing all means that i may well be wasting my money on premium instead of plus.

Link to comment

PDF's can be tricky.

One quick note - if you look at a note property and see that it has not been indexed, that can be a bit misleading. Images are indexed; PDF's go through a separate process. They will not always show as indexed in the properties. Only images are indexed. 

With PDFs, There are two types; some PDF's are OCR'ed when created (this is often the case with PDF's that are scanned in via a scanner. The settings can either do a normal PDF or a PDF where the text has been OCR'ed. If Evernote receives a note with a PDF that has been OCR'd already, then we will merely add the PDF and use the OCR data for indexing.

If a PDF comes in without that OCR data, Evernote will run it through a process that does index it and pick out keywords.

When Evernote processes the notes in this fashion, it will also highlight individual works in the When you search in the PDF.

All of this can create confusion when accessing, searching contents of PDF's.

If your process creates a lot of scanned PDFs, I would set up the scanner to not OCR while scanning. Then Evernote will do the scanning when the note is created.

 

Link to comment
  • Level 5

I don't understand your question. It seems it has already been answered.

In my situation, all my PDF's are made searchable by my ScanSnap scanner. As you pointed out, that means Evernote will not do any OCR on the PDF.

I just tried an Evernote search for 08131. It found 36 PDF's that contain the number. The number is found on all my grocery store thermal receipts. To illustrate the accuracy of the search, take a look at this screen capture of the PDF. Note the text on thermal receipts is not very clear, but Evernote still found all of them.

 

Screen Clip of PDF receipt.png

Link to comment

Final clarification please: (and thank you all for your help)

If i have my scanner software do the OCR, which means that Evernote will then NOT do any OCR according to the Evernote rules, when i do a search inside Evernote, will it search the OCR information contained within the pdf that my scanner software has done? I don't want to do anything to lessen Evernote's ability to search notes (including pdf's) to find things. Search is absolutely Evernote's most important feature.

Link to comment
  • Level 5*
On 8/9/2016 at 6:36 PM, jfkaess said:

I guess the only reason was that a few years ago i thought i read in the forums that Evernote's OCR was better than that of the scanner software. Has that changed?

I don't think that Evernote "OCR" has ever been better than most OCR tools.

For one thing, Evernote does NOT provide true OCR.  It provides a best guess at creating an index for the image.  This is quite different from recognizing actual characters in the image.  

But even if Evernote "OCR" was equal to external OCR, having the actual text that is OCR'd that is part of the PDF which you can select, copy, and search using PDF tools outside of Evernote make it more than worthwhile to me to OCR the PDF before it put it into Evernote.

Link to comment
  • Level 5*
17 hours ago, jfkaess said:

feedback on how Evernote's OCR compares with that of most scanner software

I can't compare either,  but I prefer my own OCR for several reasons including

  1. File sizes are reduced (characters replace pictures) so uploads are smaller
  2. I know when the OCR is done
  3. Files are searchable immediately
  4. The OCR content moves with the file (AFAIK Evernote's OCR is kept on the server,  optionally downloadable as a separate file...)
  5. Any special words or jargon can be dealt with correctly - I deal with techernickle stuff.  ;)
Link to comment
  • Level 5
35 minutes ago, jfkaess said:

if Evernote doesn't think this is important, then maybe i should downgrade to Plus next renewal.

That thought has crossed my mind also.

 

I can't offer a comparison with Evernote's OCR because I rely on my scanner, but I can offer some feedback. I tried searching a government 8.5 x 11 form that I scanned. The form had a blue background. I was able to find the text words and the unique application code number that was manually stamped onto the document. I found text in red (on the blue background). And I found reverse text (white characters on a black background). 

 

 

Link to comment

Jbenson2,

thanks! I have that option turned off right now on my scanner software. I guess the only reason was that a few years ago i thought i read in the forums that Evernote's OCR was better than that of the scanner software. Has that changed?

indexing pdf's and docs is the only reason i've been paying for premium all these years. It seems messed up that i have to manually check each pdf to find out if it's indexed or not.

if Evernote doesn't think this is important, then maybe i should downgrade to Plus next renewal.

i'd appreciate any feedback on how Evernote's OCR compares with that of most scanner software. I'm using a Canon p150 scanner which imports directly into Evernote.

Link to comment
  • Level 5

As you can see by the link (Tips for searching scanned PDF's), there are a lot of complicated rules that might affect Evernote's indexing.

And to make it more confusion, Evernote does not always give you a notification if the PDF does not meet their criteria.

For these reasons, I always (100%) scan, and create a searchable PDF with my scanner before pushing the PDF to Evernote.

This takes a few more seconds in the scanning process, but I am assured that the document is searchable. 

Link to comment
On 8/7/2016 at 7:42 PM, DTLow said:

I usually test this by doing a search.

 

 

Hi @jfkaess,

The best way to verify that a PDF is indexed in Evernote is by searching for it by note title or terms in the PDF.

For more information on and tips on searching PDFs in Evernote, please visit our Help Center article below:

Tips for searching scanned PDFs

Please let me know if you have any other questions.

Link to comment

Thank you DTlow and JMichadlTX. It seems searching for a word you know is in the pdf is the only way to know it is indexed.

gazumped, thank you for the information which would be useful if my pdf docs have not bern indexed, but apparently they have.

it would be nice if there were a check box or button or description item which would show that a document has been indexed. Indexing is the main reason i have bern paying for premium for the past 6 years and actually knowing that documents have been indexed (or not) would certainly comfort me in this tumultous time with Evernote. I looking for a sense of security that i'm getting value from my now higher fee.

Link to comment
  • Level 5*

Evernote also has some restrictions on OCR - not sure if these have changed recently,  but they used to be:

  • Filesize 50MB or less,
  • 100 pages or less
  • No “searchable” text - characters that can be selected and copied
  • No encryption
  • Not a handwritten document.
  • Image OCR: max size 3000 x 2400px at 300 DPI.
Link to comment
  • Level 5*
1 hour ago, jfkaess said:

Premium user, current version of Mac Evernote. I uploaded several PDFs today. How will I be able to tell that they've been indexed?

I'm not sure there is a reliable way to confirm this.  Perhaps an Evernote employee will jump in with a good answer.

If you go to the menu Note > Show Note Info, you can view the Image Status property.  I just ran a test on a small PDF that I added to new Note, but it immediately reported "No images to index".  So, I'm not sure whether it is reporting only direct images, or also images in the PDF.

Obviously, the most reliable would be to select the note with the PDF, and do a Find (⌘F) on text you know is in the image in the PDF.
In a quick Find text, Evernote found text in the scanned PDF within a minute of when I uploaded it.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...