Jump to content
neal105

paperless PDF Split for smaller file size

Recommended Posts

I had a 28mb PDF file that I couldn't load into EN due to free version size limitation of 25mb. I used a utility to open the .pdf pages and split the document into two sections. Pages 1 - 66 and Pages 67 - 125. I have two files labeled part 1 and part 2.

Simple and free.

  • Like 1

Share this post


Link to post

For some weird reason if you OCR a PDF file, the resulting searchable PDF is often substantially smaller. I guess because you replace large pictures with standard characters, but I don't specifically know. Also you can reduce PDF file sizes in post editors by choosing an "optimise file" (or something similar) option from a menu, or by selecting all the pictures in the file and scrapping any cropped areas, and fixing them all to 300DPI or as much less as your presentation can stand, down to 72DPI. Save the file with a new name (even if it's originalfilename2.pdf) an you should have a much slimmer version. There are PDF (and various other) file size reducers on the net - use your search-engine-of-choice to turn up the current options. Or if you're dealing with a published document, see whether there's already a PDF version on the net - that's often smaller than the output from your own scanner.

Or you could just go Premium for a month or two.

Share this post


Link to post

For some weird reason if you OCR a PDF file, the resulting searchable PDF is often substantially smaller. I guess because you replace large pictures with standard characters, but I don't specifically know.

I have seen Adobe claim somewhere on their blog that simply loading and saving a file with Acrobat can often reduce it's size. Simply because it rewrites the PDF format more efficiently. When I do this with ScanSnap files, I often see a reduction of 25%. (This happen with or without OCR'ing the document).

Share this post


Link to post

i often optimize my files (adobe acrobat pro) and achieve remarkable reductions, but i think this comes at the expense of images within the file, and it really depends a lot on the content. certainly, black and white makes a big difference.

the premium limit of 50mb is painful, but much preferred to the free 25!

Share this post


Link to post
i often optimize my files

Optimising is a specific process whereby you ask your PDF editorofchoice to throw away any

  • embedded fonts that aren't in use
  • 'undo' items it has in its cache
  • cropped bits of pictures not on view
  • higher resolution versions of pictures than xxx dots per inch

Some or all of these choices are optional and some variable. depending on the version of PDF eoc you're using.

I was reminded of my earlier comments yesterday though when I changed some ScanSnap settings and noticed that for some unspecified time I've been OCRing files with a "first page only" box ticked. Not sure how long that's been selected - I haven't visited this menu often in the past - although that's certainly going to change when I stop swearing. My data paranoia is on full alert now, and I checked to see whether a recent scan-and-OCR file which showed just over 10MB would be further reduced if I manually OCR'd it again after scanning.

Sure enough the file dropped to 8MB with no observable loss of quality. No illustrations - and anyway that setting is on a 600DPI minimum. Since ScanSnap installed Adobe in the first place I've rather assumed that it's Adobe OCR that is used as part of the scanning process; and it's presumably Adobe that saves the file after scanning. So I'm still confused as to why another OCR and save can save 20-25% of the file size. And paranoid that I'm missing something important. (Paranoia isn't obligatory if you work with computers, but it often helps..)

Just to stress - I'm not optimising these files; I'm just belt-and-braces OCRing them and saving them in Adobe.

Guess we'll just have to go with the "saving more efficiently" theory for the time being.

Share this post


Link to post

I put a high premium on reducing my file sizes since most of what I scan is bills, statements, invoices etc and I don't care too much about the appearance. I use Acrobat's OCR because I can do it at the same time that I Optimize. I also adjusted the optimize parameters for "apply adaptive compression", JPEG 2000, Lossy and small size file. I left the optimize filters the same. I then save the file with "reduce file size" as a final step. I select B&W for most scans and have a compression of 3 selected in the scansnap settings. A recent scan of 400 pages with these parameters was under 2MB.

Share this post


Link to post

While I see the point of splitting larger PDF's, I personally would try avoiding that solution for ease of reading, etc. This thread provides a lot of good advice about reducing file size.

  • Like 1

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×
×
  • Create New...