Jump to content

Why use PDF files in Evernote?


Don Dz

Recommended Posts

Those of you with thousands of large PDF files in Evernote, what do you get out of it that you couldn't do more cleanly with Windows search and Adobe Acrobat or similar tools?  Aside from moving files from one computer to another (what I do mostly), what is the benefit of attaching so many PDF files?

I am trying to rethink how I use Evernote attachments, particularly PDF files.  I find the clutter can get unwieldy in a hurry, especially with very large PDF files, searches become nearly useless since the information cannot be easily split and organized into separate notes.  Plus a lot of PDF files contain confidential information with no way of encrypting.

I exclude them from searches with a tag  to help restore some sanity.  The ones I have to add for work projects I ended up moving to a separate user name, but those are usually not too large (just too many).

My main platform is Windows, it seems even more cluttered on Android and iOS. 

Link to comment
  • Level 5*
27 minutes ago, Don Dz said:

Plus a lot of PDF files contain confidential information with no way of encrypting.

PDFs are one format that have a native encryption feature1971777863_ScreenShot2019-02-11at22_01_12.png.dd0fe04f75d2e2c0be143b41b47fae09.png.52e9662fdd84a596bfecc5b7e2c5b4ae.png

I encrypt the pdf on my Mac with the Preview app, or an Automator script.

>>what is the benefit of attaching so many PDF files?

PDF is my preferred format for static data. It's ubiquitous.

I have many pdf files, and have to store them somewhere. Evernote is my single storage location.

As to clutter, I have the option of using inline/attachment display.

 

Link to comment
43 minutes ago, DTLow said:

I have many pdf files, and have to store them somewhere.

As to clutter, I have the option of using inline/attachment display.

I was thinking more along the lines of how Evernote can be used to organize data,  and the issue of organizational clutter rather than note clutter, because PDF files contain too much information, things like note links and tags are not terribly useful with a single large clump of information.  Plus they get too many search results that are hard to narrow down in Evernote.

Currently I store all my static data files on Windows folders in my main computer  I usually handle them with Windows 10 search, Google Desktop, and Adobe Acrobat. 

Frequently requested features like links within notes would probably not work with PDF files even if implemented, and even though the PDF format itself supports that feature, most large PDF files I download don't appear to take advantage of it.

>>PDFs are one format that have a native encryption feature

I guess I would have to purchase that functionality, I am not seeing it in the free Windows tools I use.

Link to comment
  • Level 5*
20 hours ago, Don Dz said:

Currently I store all my static data files on Windows folders in my main computer  I usually handle them with Windows 10 search, Google Desktop, and Adobe Acrobat

Moving the storage to a different location doesn't solve the issue.

>>Plus they get too many search results that are hard to narrow down in Evernote.

General text search is a last resort for me.  

I have the critical information in the title, and use intitle searches.

Usually, tag searches are sufficient to retrieve the required notes

Link to comment
  • Level 5*

I like having all my stuff in one place and accessing the non private stuff on all platforms.  EN helps with this other than some dysfunction with web/IOS searches. 

I don't have any issues with too many results in a search.  My notes with PDFs in them are tagged.  Typically I do a tag search before I do a text search.  For example if I am looking for a technical article, I would do tag:pc.stuff whatevertext.  Also, I don't have too much trouble with too many results based upon a pure text search, and if I do I add tags to the search.  Could be a use case thing.

Link to comment
4 hours ago, DTLow said:

Moving the storage to a different location doesn't solve the issue.

I am guessing you mean you tend to download PDF files directly to Evernote?  I generally download them to either my Desktop or Downloads folders, then decide from there. 

>>General text search is a last resort for me.  

Most of my data is reference information, so I rely heavily on general text searches, though I use tags a lot as well as other things.

From what I have read, I believe you rely heavily on dating every note on the title itself;  currently I don't do that since Evernote  handles dates well enough.  I am not sure dates would make things any easier with PDF files, even some files that come named with dates I tend to remove the date since the description is what I need.

Link to comment
4 hours ago, CalS said:

Could be a use case thing.

Probably, I was also thinking it would be the size of the PDF files, the few PDF files I keep on EN are large reference files and some ebooks.  Generally if the content is not too large, I tend to copy the content rather than the file, that way EN will highlight whatever I am searching for.

Link to comment
  • Level 5*
56 minutes ago, Don Dz said:

I am guessing you mean you tend to download PDF files directly to Evernote?  I generally download them to either my Desktop or Downloads folders, then decide from there. 

No.  What I mean is

You've indicated problems with clutter, search and pdfs

I'm not seeing these problems solved by storing the pdfs "on Windows folders in my main computer"

Link to comment
28 minutes ago, DTLow said:

You've identified problems with search and pdfs

I'm not seeing these problems solved by storing the pdfs "on Windows folders in my main computer"

That's why I started this topic, I see other people happy with a solution I am not satisfied with, so just trying to understand different perspectives.

Link to comment
  • Level 5*
1 hour ago, Don Dz said:

Probably, I was also thinking it would be the size of the PDF files, the few PDF files I keep on EN are large reference files and some ebooks.  Generally if the content is not too large, I tend to copy the content rather than the file, that way EN will highlight whatever I am searching for.

I have some large PDFs but the majority not so much.  I decided to go paperless using EN some time back, which is one of my use cases.   So just about any document, statement, manual, research article, whatever goes into EN.  20k PDFa at this point, half my notes more or less. 

Link to comment
  • Level 5

For me, EN serves as a big database especially to go paperless, and to support some workflows.

From my experience, it is practically impossible to organize this information based on folders in windows. Folders (as notebooks) are always one-dimensional. The search function in windows is simply not up to the job.

Example: I am renting some appartements. If there is an invoice, I may need it a) for the bank payment b) for the calculation of the cost covered by my tenants c) for my tax declaration d) for warranty issues etc. So before EN I ended up in creating copies of the original file going into several folders, or stacking paper.

With EN, I use several tags on this single note / document, and will find it fast and reliably whenever I need it.

A hint for protecting pdf files: Maybe try the freeware „Freepdf“. It uses the freeware „Ghostscript“ for its GUI, which needs to be installed first. Development was discontinued in 2017, but it still works under all windows versions up to Win10.

Another freeware tool is „pdfcreator“, which lets you encrypt files even in the freeware version. Just pick the „Expert“-mode to install, take care to read through the boxes during the installation and click away most options, because they want to install some additional bloatware programs that you may not want on your PC.

Encryption is done by creating a PW-protected profile, and applying it in the print-dialogue that creates the pdf. This means the same PW for all files created with the same profile (See Screenshots).

Finally some scanners can PW-protect the PDFs they create, for example the ScanSnap ix500 i am using can do this through a profile that needs to be set up. This is found a bit hidden behind the „options“-button in the „File options“-tab.

For myself I have decided against encrypting the PDFs that go into EN, except those few where I feel an additional level of protection would be good. If encrypted, the search function will not work on these files, so I would have to rely completely on tagging and adding free text to the notes. This goes contrary to my usecase of EN.

pdfcreator.JPG

Printpdf.JPG

OpenWpw.JPG

Link to comment
18 hours ago, Don Dz said:

>>PDFs are one format that have a native encryption feature

I guess I would have to purchase that functionality, I am not seeing it in the free Windows tools I use.

This powerful free scanner tool https://www.naps2.com/  stores my scanned documents in encrypted pdf (encryption is optional of course).  In my workflow I store directly into the evernote import folder.

edit: I forgot to tell that it can also OCR the pdf document

Link to comment
  • Level 5*

Apologies in advance - this is going to be another plug for my 'smart title' setup.  I agree with @Don Dz that adding PDFs to the mix does pollute the search pool considerably - I have a variety of PDFs in my database including the odd 400-page support manual,  so a general search on more or less any word or phrase gets quite a few (!!) hits.  Most of my searches now are "intitle:<keyword>".

I've also had a late conversion to Filterize where (forinstance) I now have a dashboard (and auto-updated table of contents note) labelled 'recipes' which will feature all the notes I have with food preparation details.  It runs off all the notes I have with 'recipe' in the title,  plus all those tagged 'recipe'.  I'll be able to tweak it further if I find that notes are still missing from the list.

(I found the main annoyance with Evernote's search grammar is remembering exactly how I found all those notes last month using advanced syntax.  Having a predefined search running all the time saves me a grey cell or two.)

I'll try to explain all this in detail at some point in a proper post - something to look forward to in these long winter (if you're in this hemisphere) evenings.... 😎

Link to comment
  • Level 5*
22 minutes ago, gazumped said:

 ..remembering exactly how I found all those notes last month using advanced syntax.  Having a predefined search running all the time saves me a grey cell or two.)

Shortcuts and Saved Searches are a solution.  
However, I found these menus to be cluttered.  

I use scripts (Mac) instead of shortcuts.  
I also have the search parameters specified in the note contents; for example Current Tasks  reminderOrder:* -reminderTime:day+1 -reminderDoneTime:*

Link to comment
  • Level 5*
1 minute ago, DTLow said:

I also have the search parameters specified in the note contents;

Two minds - I also show the search parameters in my dashboards,  so I can edit them if/ when necessary...   ☺️

Link to comment

Thanks for the comments so far.  Whether expressed or simply implied, the "consensus" (if I may call it that) appears to be:

1-For many people, search pollution is probably as bad as I think it is with a lot of PDF files, especially large ones, it's just that people have found ways to cope that works for them.

2-People that add many PDF files "probably" add a lot of scanned content from a paperless office approach, which tend to not be too large individually, much easier to handle.

3-People tend to rely heavily on tag & intitle searches, saved searches, and other types, plus  searching for special text body tags, and are probably resigned to less frequent use of simple text content searches, as they consider this a fair trade for the value of having all their PDF files in one place, and available on multiple devices.

4-Date searches for everything is the mantra of at least a few people, especially on the title (observed in other discussions).

5-A fair number  rely heavily on scripting and searches saved on the body of notes.

All the above solutions I already use to different degrees, the only thing is that, using the GTD method, I tend to feel free to organize my notes after I collect them, whereas some discussions mention the importance of tagging, titling and dating everything while you collect them.  This I do sometimes, but I tend to focus on what I am doing rather than organization in the moment.

Link to comment
  • Level 5*
8 minutes ago, Don Dz said:

Whether expressed or simply implied, the consensus appears to be:

Not sure the consensus in such a small data set.  😉 

More than anything else folks seem to tailor their process to their use case and whatever suits their eye.  Consensus may be do what feels best to you and here are some lessons learned from others.  Don't think there is a "right" solution in all of this, but some learning for sure.  One of the strengths of EN is the ability to lever its capabilities to so many different use cases.

Link to comment
31 minutes ago, CalS said:

Not sure the consensus in such a small data set.  😉 

I am using the word rather loosely of course, just my impression based on reading many discussions, this data set just adds an angle I was less clear about before.

Not the right solution, just how it works for others so I can determine if it can work for me, I gather that where I see a big problem, others see a small inconvenience, just hoped to learn what I can borrow for my use.

Link to comment
  • Level 5*
10 hours ago, Don Dz said:

just how it works for others so I can determine if it can work for me

My favorite hobby-horse is that there is no 'best' way to do anything - only what (currently) works for you.  The longer you spend looking at alternatives and considering what might work,  the more time you're wasting.  Just pick a method and use it for a while. 

Stick with what works,  or try something different.  Sooner or later you'll find things just ticking along nicely.  After that,  resist the temptation to try anything else. 

If something works,  it's a really REALLY bad idea to go on fixing it...  😎

Link to comment
4 hours ago, gazumped said:

Stick with what works,  or try something different.  Sooner or later you'll find things just ticking along nicely.  After that,  resist the temptation to try anything else. 

There was a point when I really was much better off without computers.  I was also fine without a car.  Then I didn't think I needed a cel phone at all.  Then I was just fine without color screens on any portable device.  I also was fine without getting married.  Did ok for a long while without the need for PDF files. I didn't really need to move away from Palm devices years after the company died and everyone else had moved on.  I was actually a little happier without Evernote or any of my data on the cloud. 🌨️🌩️

Sometimes the Amish lifestyle looks really good to me. 🤣

>>If something works,  it's a really REALLY bad idea to go on fixing it

I much prefer how Evernote behaves with nothing but plain text in it, not even tables.  Same with most programs I have used over the years.

I visited my oldest daughter over Christmas, her apartment was nearly empty.  I honestly was jealous.

I miss my bike, didn't need anything else. 🚲

Link to comment
  • Level 5*

I am definitely in the "lots of PDFs but they're relatively small" camp, and use tags and searching without much issue.

However, I definitely have a lot of manuals, genealogy information, etc. in my database that can be scores of pages inside one PDF. When possible, I actually try to break those out into separate PDFs and separate notes to help chunk it out into better lumps for organizing. I also rely pretty heavily on the annotation summary you can enable when annotating a PDF. It adds little snippets at the top of the PDF showing the edits and annotations you've made and what page they're on. So, when I add PDFs that are huge (like the one below), I make sure to edit and annotate it to show the bits that are important in the PDF at the top as a point of reference.

Example:

4C653B84-B6AC-4756-91F4-F5E64970D7B8.png.a5c6fea822deef58ede1bc81a9835100.png

Link to comment
1 hour ago, chirmer said:

I also rely pretty heavily on the annotation summary you can enable when annotating a PDF. It adds little snippets at the top of the PDF showing the edits and annotations you've made and what page they're on. So, when I add PDFs that are huge (like the one below), I make sure to edit and annotate it to show the bits that are important in the PDF at the top as a point of reference.

Is that a feature of free software like Adobe Reader or similar, on Windows? 

This suggestion is actually new to me.

Link to comment
On 2/13/2019 at 8:34 PM, eric99 said:

This powerful free scanner tool https://www.naps2.com/  stores my scanned documents in encrypted pdf (encryption is optional of course).  In my workflow I store directly into the evernote import folder.

edit: I forgot to tell that it can also OCR the pdf document

I noticed in the latest version, apart from the scanned files, you can import pdf or image files from other sources and combine these with  scanned files into a new pdf.

So you can use NAPS2  to encrypt and / or OCR any pdf or image file.

Link to comment
3 hours ago, eric99 said:

I noticed in the latest version, apart from the scanned files, you can import pdf or image files from other sources and combine these with  scanned files into a new pdf.

So you can use NAPS2  to encrypt and / or OCR any pdf or image file.

I wonder if one could encrypt only part of the PDF, at least on a page basis if not smaller, like EN can do.

Link to comment
  • Level 5*
19 hours ago, eric99 said:

So you can use NAPS2  to encrypt and / or OCR any pdf or image file.

Hmmn.  NAPS2 looked interesting,  so I installed it - turns out the WIA / TWAIN compatibility it looks for in scanners is not part of the ScanSnap DNA,  so it won't talk to my S1500 - although it recognises my MF printer and the device camera.  I did import a bunch of S1500 scanned PDFs into NAPS2 but then couldn't find a way to OCR them after the event,  so it doesn't look like I can use this software for much...  🙁

Link to comment
19 hours ago, gazumped said:

Hmmn.  NAPS2 looked interesting,  so I installed it - turns out the WIA / TWAIN compatibility it looks for in scanners is not part of the ScanSnap DNA,  so it won't talk to my S1500 - although it recognises my MF printer and the device camera.  I did import a bunch of S1500 scanned PDFs into NAPS2 but then couldn't find a way to OCR them after the event,  so it doesn't look like I can use this software for much...  🙁

NAPS configuration:

First, configure the OCR settings (enable OCR for pdf) and encryption settings if you like. All these settings are persisted on disk.

From now on,  you can just save the imported files to OCR'ed and / or encrypted pdf files by pressing the PDF button.

Apparently you can automate your NAPS2 workflow by scripting : https://www.naps2.com/doc-command-line.html

I didn't try that myself

Link to comment
  • Level 5*

Oooh - shiny!  Looks like I can run a script to import scanned PDFs from my desktop folder,  OCR and then output them directly to an Evernote Import Folder to put them into Evernote.  (Assuming that the import folder doesn't try to grab the file before NAPS2 has finished saving it...)  More testing required I think but this looks promising 🧐

(I prefer to scan to folder and OCR later because I get to delete the odd blank page,  add titles and generally dot i's and cross t's in the completed scan - plus OCR later seems to save a few seconds each scan.  Batch OCR does take a few minutes,  but I'm usually getting more coffee by that stage,  or finding something I can bang my head on...)

Link to comment
  • Level 5*
3 hours ago, gazumped said:

Oooh - shiny!  Looks like I can run a script to import scanned PDFs from my desktop folder,  OCR and then output them directly to an Evernote Import Folder to put them into Evernote.  (Assuming that the import folder doesn't try to grab the file before NAPS2 has finished saving it...)  More testing required I think but this looks promising 🧐

(I prefer to scan to folder and OCR later because I get to delete the odd blank page,  add titles and generally dot i's and cross t's in the completed scan - plus OCR later seems to save a few seconds each scan.  Batch OCR does take a few minutes,  but I'm usually getting more coffee by that stage,  or finding something I can bang my head on...)

Any idea how it deals with PDFs containing renderable text?

Link to comment
15 hours ago, CalS said:

Any idea how it deals with PDFs containing renderable text?

Maybe print it to PDF, then OCR again. 

The only problem with this suggestion is that the OCR software that comes with Scansnap (which I think you use) doesn't like PDFs not produced by that scanner, not sure how it can tell, maybe different software in that case.

Link to comment
  • Level 5

If you go to the properties of the pdf, it tells how it was created. In my case it states the iX500, the software and the version. So the SW gets the source info and puts it into the pdf file.

Maybe the SW is just meant to work inside of the bundle with the scanner, like it was sold. Some restrictions are quite normal in these bundles - it is the same with the RAW-SW package that came with my camera. It will not work with every RAW format.

Link to comment
  • Level 5*
37 minutes ago, Don Dz said:

Maybe print it to PDF, then OCR again. 

The only problem with this suggestion is that the OCR software that comes with Scansnap (which I think you use) doesn't like PDFs not produced by that scanner, not sure how it can tell, maybe different software in that case.

Yeah, that is what I am trying to avoid, plus printing to PDF doesn’t always work.  There’s a print to XPS which works, but it is multiple steps.  No issues with anything I scan, it is the downloaded stuff.  I use an older version of Adobe which can’t deal with renderable. 

Link to comment
  • Level 5*
5 hours ago, CalS said:

I use an older version of Adobe which can’t deal with renderable.

 

21 hours ago, CalS said:

Any idea how it deals with PDFs containing renderable text?

Hmmn.  Not sure - I just tried OCRing a PDF in NAPS2 that already had renderable text,  and there were no error messages - the saved output file was slightly larger than the original and searches still worked fine...  ???

Link to comment
13 hours ago, PinkElephant said:

If you go to the properties of the pdf, it tells how it was created. In my case it states the iX500, the software and the version. So the SW gets the source info and puts it into the pdf file.

i wonder if there is a way to change a PDF properties so that it will be accepted by the ScanSnap OCR software.  Preliminary searching came up empty.

Link to comment
  • Level 5

As I see it, the scanner does not send a pdf to the SW on the PC. It is some other format, most likely proprietary. The iX500 is not compatible to the TWAIN standard used by many other scanners.

The pdf is one of the possible output data formats. If the SW that does all the job on the PC is not open source (which I doubt), it is probably hard to not worth the effort to try to get in between. A quick search in the internet gave a number of available freeware options that will OCR whatever you feed.

On the professional side there is more, mostly with business process integration, and for a small fee 💰💵💵💴💶💵💰

 

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...