Jump to content

PDF/A Long-term Archiving Standard


GHall

Recommended Posts

With going paperless and reading many wonderful use cases for Evernote, I've also stumbled upon many different projects people are engaged in. One that interests me greatly is scanning books to PDF format. I have 600+ books that could be converted to PDF format to make the storage area needed for them smaller and to increase portability.

A side note: the portability factor is probably the biggest driving factor to complete a book conversion project like this. Reading about the use of iPad from users such as GrumpyMonkey in these forums has spurned interest in possibly acquiring my own iPad. At the very least I want to be sure as much of my content is ready to be viewed on an iPad, so that I do not have to deal with the conversion of documents when I finally make the leap.

Converting so many books of course is a huge undertaking and perhaps would be considered not very worthwhile by some. I like to tinker. So, this sort of project in my spare time seems fun. I enjoy learning new things. As a result I've delved deeper into the PDF standard and found PDF/A.

There are many areas of law and commerce that require long term reliable storage of data, so the PDF/A standard was developed. PDF/A has become the standard for submitting legal documentation to courts. Essentially PDF/A limits the scope of useable PDF features and requires they be embedded into the file, so all the data needed to view the PDF is contained within it. For example links to fonts is prohibited. This prevents the possibility that a particular font be absent from a particular computer (either now or future), so that when the font link is invoked, the formatting of the file becomes altered. This doesn't happen with PDF/A. Instead the PDF/A file uses fonts within the file so formatting does not get altered.

Some of the documents that get scanned into Evernote, such as utility bills, do not typically need to last decades. Perhaps having acces to them for up to seven years will suffice, since that would take care of tax needs in the US for example. In those cases PDF/A would not be needed. But what about keepsakes, such as scanned projects from your kids' early years or your family genealogy research? Perhaps you want to pass them to your future generations?

PDF/A seems to be the way to store information for the long-term. Getting back to scanning my book collection, I've been thinking that since it's so labor intensive, that I would want to take the time to be sure it's done as good as I can get from the beginning.

For all interested parties, here is a link to the book PDF/A in a Nutshell: Long-Term Archiving with PDF, which is put out by the PDF/A Competence Center of the PDF Association.

http://www.pdfa.org/...-in-a-nutshell/

It contains background information into PDF, compares Microsoft's XPS format with PDF, describes the various levels of PDF/A and shows specifically how to convert or create PDF/A documents. Many screenshots abound. Adobe Acrobat 8 Pro is used (version 9 is similar) along with a few products from other vendors.

Do you use PDF/A?

Link to comment
  • Level 5*

With going paperless and reading many wonderful use cases for Evernote, I've also stumbled upon many different projects people are engaged in. One that interests me greatly is scanning books to PDF format. I have 600+ books that could be converted to PDF format to make the storage area needed for them smaller and to increase portability.

A side note: the portability factor is probably the biggest driving factor to complete a book conversion project like this. Reading about the use of iPad from users such as GrumpyMonkey in these forums has spurned interest in possibly acquiring my own iPad. At the very least I want to be sure as much of my content is ready to be viewed on an iPad, so that I do not have to deal with the conversion of documents when I finally make the leap.

Converting so many books of course is a huge undertaking and perhaps would be considered not very worthwhile by some. I like to tinker. So, this sort of project in my spare time seems fun. I enjoy learning new things. As a result I've delved deeper into the PDF standard and found PDF/A.

There are many areas of law and commerce that require long term reliable storage of data, so the PDF/A standard was developed. PDF/A has become the standard for submitting legal documentation to courts. Essentially PDF/A limits the scope of useable PDF features and requires they be embedded into the file, so all the data needed to view the PDF is contained within it. For example links to fonts is prohibited. This prevents the possibility that a particular font be absent from a particular computer (either now or future), so that when the font link is invoked, the formatting of the file becomes altered. This doesn't happen with PDF/A. Instead the PDF/A file uses fonts within the file so formatting does not get altered.

Some of the documents that get scanned into Evernote, such as utility bills, do not typically need to last decades. Perhaps having acces to them for up to seven years will suffice, since that would take care of tax needs in the US for example. In those cases PDF/A would not be needed. But what about keepsakes, such as scanned projects from your kids' early years or your family genealogy research? Perhaps you want to pass them to your future generations?

PDF/A seems to be the way to store information for the long-term. Getting back to scanning my book collection, I've been thinking that since it's so labor intensive, that I would want to take the time to be sure it's done as good as I can get from the beginning.

For all interested parties, here is a link to the book PDF/A in a Nutshell: Long-Term Archiving with PDF, which is put out by the PDF/A Competence Center of the PDF Association.

http://www.pdfa.org/...-in-a-nutshell/

It contains background information into PDF, compares Microsoft's XPS format with PDF, describes the various levels of PDF/A and shows specifically how to convert or create PDF/A documents. Many screenshots abound. Adobe Acrobat 8 Pro is used (version 9 is similar) along with a few products from other vendors.

Do you use PDF/A?

No. Laziness more than anything. If I remember correctly, PDF/A makes it read-only, so I cannot add bookmarks, annotations, etc. later without saving a separate copy. No big deal, but I haven't seen it to be worth the hassle.

Link to comment

Do you use PDF/A?

No. Laziness more than anything. If I remember correctly, PDF/A makes it read-only, so I cannot add bookmarks, annotations, etc. later without saving a separate copy. No big deal, but I haven't seen it to be worth the hassle.

Actually, bookmarks are permitted. So is OCR and accessibility for persons with sensory issues. Security features are not. Also, can not utilize transparency or PDF layers, so watermarks can not be used for example.

For persons using TWAIN compliant scanners, Acrobat 9 Pro allows for scanning directly to PDF/A. This does not help me with my ScanSnap S1500M. But there are several other easy ways to make PDF's into PDF/A using Acrobat 9 Pro or some other software titles.

A drawback to using PDF/A is file size. There is some overhead for the embedding of fonts, etc.

Link to comment
  • Level 5*

Do you use PDF/A?

No. Laziness more than anything. If I remember correctly, PDF/A makes it read-only, so I cannot add bookmarks, annotations, etc. later without saving a separate copy. No big deal, but I haven't seen it to be worth the hassle.

Actually, bookmarks are permitted. So is OCR and accessibility for persons with sensory issues. Security features are not. Also, can not utilize transparency or PDF layers, so watermarks can not be used for example.

For persons using TWAIN compliant scanners, Acrobat 9 Pro allows for scanning directly to PDF/A. This does not help me with my ScanSnap S1500M. But there are several other easy ways to make PDF's into PDF/A using Acrobat 9 Pro or some other software titles.

A drawback to using PDF/A is file size. There is some overhead for the embedding of fonts, etc.

My understanding was that you cannot add bookmarks AFTER you save as PDF/A. The same for annotations and so forth. It's read only, right?

It's not a big deal, but it is an extra hassle (if I am right). If you can add bookmarks and stuff afterwards (I haven't use the PDF/A file type for a long time, so I don't remember well) then it might be time for me to make a switch.

Link to comment

Do you use PDF/A?

No. Laziness more than anything. If I remember correctly, PDF/A makes it read-only, so I cannot add bookmarks, annotations, etc. later without saving a separate copy. No big deal, but I haven't seen it to be worth the hassle.

Actually, bookmarks are permitted. So is OCR and accessibility for persons with sensory issues. Security features are not. Also, can not utilize transparency or PDF layers, so watermarks can not be used for example.

For persons using TWAIN compliant scanners, Acrobat 9 Pro allows for scanning directly to PDF/A. This does not help me with my ScanSnap S1500M. But there are several other easy ways to make PDF's into PDF/A using Acrobat 9 Pro or some other software titles.

A drawback to using PDF/A is file size. There is some overhead for the embedding of fonts, etc.

My understanding was that you cannot add bookmarks AFTER you save as PDF/A. The same for annotations and so forth. It's read only, right?

It's not a big deal, but it is an extra hassle (if I am right). If you can add bookmarks and stuff afterwards (I haven't use the PDF/A file type for a long time, so I don't remember well) then it might be time for me to make a switch.

Bookmarks can be added. They can be added because security features that limit or control the file are not allowed. This is to ensure that the file use is not limited by those type of features.

I wish that my ScanSnap S1500M had the ability to scan directly to PDF/A.

Link to comment

PDF/A format would be helpful for someone working on a PhD for example, who might be scanning original source material. That material would then be available for decades to come. Page 42 of the book in this topic talks about data that goes missing in Webpages for example. PDF/A does not allow this kind of thing to happen.

This is interesting stuff.

Link to comment
  • 1 year later...

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...