Merging a bunch of PDFs together

A couple of days ago one of the questions I asked was for an easy (and preferably command line scriptable) way to merge a bunch of PDF files together. Well, I think I found a way.

MonkeyBread Software makes RealBasic plugins and extensions. I’ll be the first one to say that I don’t know jack about RealBasic, however one of the freely downloadable tools that they provice is Combine PDFs (They even include RealBasic source). It’s a tiny carbon app, that basically does what I want it to do.

It has an interesting “feature” – it seems to get rid of the “Image Unavailable for Copyright Reasons” watermark when dealing with PDF files generated by NPG. So I just get white blocks with occasional capture under the text. But hey, it’s free, so who am I to complain?

One of the tricks I use while using Merge PDFs is to rename a bunch of PDFs into numerically ordered list, something like:

$ grep pdf index.html| sed regular expression or three go here to result in file list 
 | nl -v100 | awk '{print "mv "$2" "$1".pdf"}' | sh

where I basically use nl(1) to start labeling the lines with 100 and counting onwards.

Then inside Combine PDFs I can just tell it to order files in alphabetical order, and off I go.

Here is what a real run would look like:

stany@gilva:~/nature/www.nature.com/nature/journal/v435/n7043[06:56 PM]$ 
cat index.html | grep  pdf | sed 's/^.*href.................................//g' | 
sed 's/......$//g' | nl -v100  | head
   100  435713a.pdf
   101  435713b.pdf
   102  435714a.pdf
   103  435716a.pdf
   104  435718a.pdf
   105  435718b.pdf
   106  435720a.pdf
   107  435720b.pdf
   108  435723a.pdf
   109  435723b.pdf
stany@gilva:~/nature/www.nature.com/nature/journal/v435/n7043[06:56 PM]$ 
cat index.html | grep  pdf |  sed 's/^.*href.................................//g' | 
sed 's/......$//g' | nl -v100 | awk '{print "mv pdf/"$2" pdf/"$1".pdf"}' | head
mv pdf/435713a.pdf pdf/100.pdf
mv pdf/435713b.pdf pdf/101.pdf
mv pdf/435714a.pdf pdf/102.pdf
mv pdf/435716a.pdf pdf/103.pdf
mv pdf/435718a.pdf pdf/104.pdf
mv pdf/435718b.pdf pdf/105.pdf
mv pdf/435720a.pdf pdf/106.pdf
mv pdf/435720b.pdf pdf/107.pdf
mv pdf/435723a.pdf pdf/108.pdf
mv pdf/435723b.pdf pdf/109.pdf
stany@gilva:~/nature/www.nature.com/nature/journal/v435/n7043[06:57 PM]$ 

You get the idea.

Then it’s just drag and drop.

I’ve still not found a free way to delete duplicate pages, however PDFpen looks reasonably good (It has a problem with inability to preview the large page and the thumbnails of the rest of the pages in the file at the same time, and the interface for deleting pages is not obvious, but maybe I should contact the authors). It is 50$ USD for the basic version (And I don’t need form creation either), which is much better then fill Acrobat from Adobe.

I should contact the authors, and see if they will add the features I would like, and if they do, register the software. Hrm….

As my Spanish teacher used to say: necesito ganar dinero.