Jump to content

(Archived) Best image scanning format (pros/cons): jpg or pdf

Recommended Posts

I've just got a Fujitsu scansnap s300 for use with evernote mainly. I've reverted to the 3.1EN since the 3.5 has too many issues for not much benefit. My wife and I both have premium accounts.

I've been searching and reading but given the changing landscape of evernote's abilities to search for text in images I ask what are the pros and cons of scanning items in jpg vs pdf formats. I would prefer be able to rely on EN finding words in the image without having to scan with ocr on in the scansnap software and just let the evernote server ocr work on the image. This suggests jpg as a better approach since it seems EN has a problem doing OCR in images within PDF (is this a 3.1 vs 3.5 issue?). However multipage documents then are more time consuming to process as each jpg becomes its own note. I've also noted that copying a jpg into an EN note which already has a jpg seems at time to cause a much larger increase in the note size than just adding the bytes from the copied jpg (quite dramatically a few times). I'm interested in automating the input of my receipts, certification/licenses, continuing medical education certificates, photos of the manufacturer plates on my equipment etc.

I tried searching but jpg and pdf are too small to use as terms so if this has been well discussed, thanks to point me at the thread.

cheers, and TIA

Link to comment

We only perform text recognition on images (e.g. JPG) for Free accounts, so if you aren't Premium, you may want to stick with JPG.

If you have a Premium account, you might get better results with PDF. In particular, PDF is better for multi-page documents ... JPG only stores a single page per file, so won't handle a big document very well.

Link to comment

Thank you.

Then for premium accounts, is EN performing text recognition within the image portion of pdf's generated from the scanning process? Since all of a scanned pdf is image which OCR process is more likely to give a future search hit? Should one always use the OCR processing in the scansnap software then? Is this true of both 3.1 and 3.5?

Link to comment

If you use OCR on your own computer to add searchable text into a PDF, it doesn't matter whether you're Free or Premium ... either way, we'll accept the PDF and index the text so that you can search for words in that PDF.

The difference comes if you're scanning to Evernote without doing your own OCR. In that case, we process your PDF scan on the service and produce a second "Searchable" version that we store with your note. This is indexed so that you can search for text in that note.

Link to comment

Thanks so much for the fast replies and at night!

A few final questions.

For premium members do you recommend doing the OCR at time of scan or letting EN only do the OCR (thus turning off OCR during the scan process) which speeds the scanning up? It seems that for single page there is an advantage for jpg in that if the image needs rotating or resizing, it can be done in EN 3.1 easily, while one is stuck with pdf.

I also noted that rotating a jpg in EN 3.1 caused the note size to inc from 156k to 260K. This note was a single jpg with no text. Is this an artifact of the jpg format itself or is it from EN storing information about the original? BTW I really like the ability to do basic jpg manipulation in EN3.1 and use it frequently. I hope it comes back in 3.5. I save space by reducing images I've grabbed from web pages to put in my notes and usually don't need them anywhere as big as they are. Since I plan to use EN for a while, eventually all this will add up. I'm really hoping you add local storage of notes (at least for read-only) on the WinMobile platform and wish to keep my note sizes down. The lack of local reading ability of my notes is the one thing that keeps me scouting for an alternative. I often need access to my notes when no cell coverage exists (hospital basement conference rooms, inside rooms, middle of desert (no joke). Kind of frustrating to have a 16g memory card, smartphone but no notes.

Again, thanks a bunch for a product I'm 90% happy with, cheers

Link to comment

If you have your own OCR software on your computer, and you're happy with its results, you may prefer to do the OCR locally. This will save a bit of space and bandwidth, since our OCR produces a second "Searchable" PDF that hides inside your note. If you do your own OCR, you can search for notes immediately without waiting for our processing to complete, etc.

On the other hand, if the net experience is less convenient to do local OCR, go ahead and scan the raw documents and we'll do the processing.

The image manipulation controls in the 3.1 client are somewhat broken ... it's actually producing a PNG file when you rotate, which is the reason for the difference. We recommend that you right-click on the image and open it in your favorite image tool instead. Even MS Paint is going to give better results than any image editing we could create. When you save the image from your editor, we'll pick up the change and replace the image in the note with your edited version.

Link to comment

If you have a Free account, you can search for the "normal" text within a PDF document (i.e. the text you can select and copy if you open it in Preview/Acrobat/Foxit/etc.).

If you have a Premium account, we'll also produce OCR for scanned PDFs, and you can search for text from this OCR.

The recognition for printed text is comparable for PDF and JPEG images. Handwriting will be better in JPEG.

The ScanSnap scanners all seem to work pretty well with Evernote, yes. Evernote employees have a bunch of them.

Link to comment
  • 2 weeks later...

Thanks everyone,

one other question - i usually take paper notes and scan them into the EN for tagging, chronology, etc. What time of PDF should I be using for multi-page notes? PDF image, PDF text? I get several options on the Mac. Most of my note-taking are several pages per meeting.

Many thanks.... or point me to how to....


Link to comment

If your software produces text-based PDFs (e.g. because you're creating notes in a word processor, or your software does OCR on scanned documents), then that will make much smaller files that will work very well in Evernote.

If you are scanning documents, and you don't have any OCR support of your own, then you can upload these scanned PDFs into Evernote, and we will process them for Premium accounts.

So I'm not sure exactly what it means on your system, but it sounds like "PDF Text" might be best.

Link to comment

I'm not sure what "PDF text" means in the application you're using to scan. My best guess is that it may be using a simpler black and white color scheme, which will shrink the size of the PDFs. Full color image scans tend to be gigantic, so you should use black-and-white, or greyscale at the worst, for anything that doesn't really need full color.

Link to comment
  • 1 month later...

Try searching for a few clear words from the dictionary (i.e. not names) from the web. If you can't find this note with any of your searches, then the image may not have been processable for some reason.

Link to comment


This topic is now archived and is closed to further replies.

  • Create New...