Jump to content
Bill48

Multifunction printer for scanning (and OCR) to Evernote

Recommended Posts

Hi.  It's possible to do everything with a standard scanner.  Depending on the volume of scans you plan to do,  a sheet feeder may be essential,  but as long as your scanner can save to PDF files,  and you have software that can OCR those files,  you have everything you need.

Share this post


Link to post

Does the scanner have to provide OCR?  Obviously I have not used this in EN yet but I thought that EN applied its own OCR in searching PDF notes.  This is important as I need to know just what specs are needed in the scanner I choose.

Share this post


Link to post

Hi again.  If you're a premium member Evernote will OCR and index your PDFs for you,  subject to certain limitations on quality and file size/ number of pages.  The process seems to take a little while after the note has been uploaded - depending on the load on the servers,  anywhere from 30 minutes to a day or so.  The indexed text exists separately from the PDF content and isn't portable - it was possible to download the OCR content separately from the PDF file at one stage - not sure if that's still the case.  Scanned files tend to be larger than locally OCR'd "searchable text" files because the scanned file is essentially a series of pictures of the entire pages,  whereas the OCR'd file contains just the text.  I recently uploaded a number of files that before OCR totalled 17MB,  which afterward came down to 11MB.  If you're doing a fair amount of scanning the difference can be significant.

 

I prefer to do my own OCR because:  I know it's done - I don't have to wait for Evernote and my notes and attachments are fully searchable immediately.  I occasionally scan big files that Evernote would not process,  and I get instant feedback if there's been a problem - I can re-scan if there's a missing page.

 

So (finally) - it's not strictly necessary to have an OCR feature with your scanner;  that would be provided by software anyway,  not the scanner hardware.  However being able to OCR locally does have some advantages!

  • Like 1

Share this post


Link to post

Ok, that's interesting.  The scanners I am looking at come with proprietary OCR software.  So if using that, scanning to EN is more than a one step process, and that's OK if the outcome is better functionality (and smaller?).  And I am a Premium member.  If you want to save emails to EN you virtually have to be.

 

Are you saying, please, that a file scanned for OCR before upload to EN is smaller, that it is just text rather than a series of images?  What if the PDF includes pictures or diagrams?  And does EN then also OCR the text that has been OCRd at source, a double OCR?

Share this post


Link to post

Evernote won't OCR anything that has already been OCR'd - the full current scan limits are here in the Help Center - https://help.evernote.com/hc/en-us/articles/208313388

My fairly venerable scanner (Fujitsu S1500) came bundled with Adobe Acrobat 9.0 which I still use for OCR and editing PDFs.  With the OCR settings I use pictures of pages with all text,  or text + images seem to become actual text + individual images if they're present.  The OCR'd file size is always smaller than the original scan with no loss (AFAIK) of quality in the pictures,  or visibility of the text.

I scan to folder on my PC,  which allows me to correct any errors,  occasionally merge files if I forgot to include something for the first scan,  or there was some sort of mechanical error - a paper jam - and re-order pages if I need to.  Sometimes those concertina-fold leaflets scan in with pages in the order 1,5,6,2,4,3 (or something like that) and need to be untangled.

I also use 'smart' titles on my PDFs,  because once processed they're moved to an Import Folder (See Evernote > Tools) which sucks them into Evernote as individual notes using the file name as a title.  I don;t want to go in and change titles or add tags afterwards (too lazy) so my PDF names/ note titles are:

<scandate> <date> <type> <source> <subject> <keywords>

scandate
- is applied by the scanner - strictly speaking it's redundant - it's also probably the created date of the note;  but it's already there...

date (of the event or item
- I don't necessarily get to scan in on the day things happen) as yyyymmdd.
- I have Phrase Express set to give me the current date in that format with Ctrl+<key>

type
- what is this?  A letter / leaflet / user guide / receipt / delivery note

source
- the actual name on the item - my bank / insurance company / washing machine manufacturer...

subject
- more details about the content if necessary - this might be a receipt from a <local shop> for washing machine

keywords
- anything else I might use to search for this item - model number / is it to do with tax / family history / day trips / holidays...

Note that all items are separated by a space,  and I use the "intitle:" search to find them.  I don't,  generally,  use tags.

 

Sorry - that was probably a lot longer response than you may have wanted,  but I get carried away.  ;)

  • Like 1

Share this post


Link to post

I'm so glad if you think you got carried away.  It shows both your passion for the subject and the results of experience.

 

Fujitsu make the recommended Snapscan.  it is so expensive.  Obviously it does the job superbly, in addition to which I thought I needed a new AIO printer, which could do the job and cost a third of the Snapscan or even less.  But then again, if I need to do any colour printing I could save to a thumb drive and have it printed inexpensively.

Share this post


Link to post

:) The decision is entirely yours - a ScanSnap sits in quite a small footprint unless you're using it (4x11 inches to be exact) and I've put 20,000 pages through mine (so far) without any problems.  Other scanners have sheet feeders,  and I'm sure do as good a job - that's just the one I opted for,  mainly because I had a huge pile of documents to digitise.

  • Like 1

Share this post


Link to post

×
×
  • Create New...