Jump to content

(Archived) ScanSnap Searchable PDF's: Not Always?


Recommended Posts

I just started trying a Fujitsu ScanSnap S300M scanner with Evernotes searchable PDF creation feature. I'm finding that in some cases, while it creates the searchable PDF (at least the context menu has an item to save as), Evernote (Mac, PC, Web) cannot find any of the words I search for.

What might be unusual is these test cases use 4Print on XP to print 2 pages on 1 side of 1 sheet of paper, i.e. the paper is printed in landscape mode, with 2 portrait pages printed on it; the other side has handwriting on it. When I first tried it with handwriting on the back written across the top in portrait mode (i.e. different orientation than the other side), I get the problem. However, I have the same problem if I have handwriting in the same orientation as the printed side (which is the same direction of text on both sides); in this case the note has the handwritten page 180 degrees turned from the orientation of the printed side.

BTW, words on neither the printed nor the handwritten sides can be found.

Almost 2 hours ago I scanned a different test ... I cut one of the printed pages in 1/2 & removed any text not in the same orientation as the main text, so all text goes across as a single page in portrait mode. There's handwriting on the back. The scanned image shows 2 pages both with the same orientation, i.e. what I'd call correct. Now, just minutes short of 2 hours since this was scanned & synced, the context menu shows NO searchable PDF save as, & nothing I've tried to search for is found on this note.

BTW, if I have the same printed side (2 pages on 1 sheet) & nothing on the other side, it works correctly. I haven't tried 1 side with handwriting only.

Is this a known problem?

Dave

Link to comment

Thanks for the report. If you save the "searchable" version of the PDF from the client, you can open it in Acrobat or Preview and select all of the text on the page to see what we found there.

You're correct that text which is not aligned with the page may have a lower chance of being recognized.

Link to comment

I tried looking at the searchable PDF in Preview, but how do I get it to show me what Evernote found there? I tried selecting all the text, but what next?

BTW, when I try selecting all the printed text, it's selected, except for a single word. Does this indicate Evernote converted all the printed words from the original PDF to text, except this one word, which it couldn't handle, so it converted it to an image just containing that one word?

All the handwritten text is displayed as the original handwritten text. This text was written as lines in my normal handwriting, which I wouldn't expect it to be able to handle, but also some lines with block characters, which I would think it could handle.

Dave

Link to comment

If you do "Select All" in your PDF program, you should see all of the regions where we found words. You can try just doing "copy" and then "paste" into a text editor to see the raw text we found for that page, or you can more carefully select the text within a single matching word to copy and paste that somewhere and see what it matched for that word.

If something in your scan doesn't show any selected text, then we didn't find any reasonable matches for it.

Handwriting in PDFs will fail to match more often than it succeeds, but carefully printed text will frequently work. We're working with our engine providers to improve this in the future.

Link to comment

Dave,

I've run some more tests. It certainly seems alignment is a problem. There may also be problems on side 2, but that could be coincidence without running more tests.

I scanned a printout generated on OSX Leopard's print facility with a profile specifying 2 pages per side, both sides. This printed with no separators between the 2 pages on 1 side, except for whitespace. This was unfortunately interpreted by Evernote 90 degrees out of alignment, i.e. treating the piece of paper as portrait, when it should've been landscape. It "found" some "characters" ... it basically interpreted some characters on the line, but interpreted them horizontally across the paper, which is up-down within each line; so pieces of multiple characters were interpreted as a single character without looking like that character at all (although I could see pieces that could be interpreted as part of the character, e.g. a short arc resulted in a "c".

Then I scanned a 4Print printout from XP, again 2 pages per side, both sides; but this had lines around & between the 2 pages on 1 side. Side 1 was interpreted correctly. But side 2 was rotated 90 degrees as in the first example, therefore misinterpreted. Sides 1 & 2 were printed in the same format, but different data.

Finally, I tried handwriting on both sides of the paper. This was aligned correctly. A moderate sized block of text (multi-line) showed as Selected. But searching for single characters (alpha, digits, misc. punctuation) I could only find a match on '2' ... but I couldn't see where the '2' was. So essentially my handwriting is a lost cause, even with clean uppercase block letters.

I'll keep my test cases, in case you want me to test any future versions of the software.

Thanks for your help.

Dave

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...