A couple of days ago one of the questions I asked was for an easy (and preferably command line scriptable) way to merge a bunch of PDF files together. Well, I think I found a way.
MonkeyBread Software makes RealBasic plugins and extensions. I’ll be the first one to say that I don’t know jack about RealBasic, however one of the freely downloadable tools that they provice is Combine PDFs (They even include RealBasic source). It’s a tiny carbon app, that basically does what I want it to do.
It has an interesting “feature” – it seems to get rid of the “Image Unavailable for Copyright Reasons” watermark when dealing with PDF files generated by NPG. So I just get white blocks with occasional capture under the text. But hey, it’s free, so who am I to complain?
One of the tricks I use while using Merge PDFs is to rename a bunch of PDFs into numerically ordered list, something like:
$ grep pdf index.html| sed regular expression or three go here to result in file list | nl -v100 | awk '{print "mv "$2" "$1".pdf"}' | sh
where I basically use nl(1) to start labeling the lines with 100 and counting onwards.
Then inside Combine PDFs I can just tell it to order files in alphabetical order, and off I go.
Here is what a real run would look like:
stany@gilva:~/nature/www.nature.com/nature/journal/v435/n7043[06:56 PM]$ cat index.html | grep pdf | sed 's/^.*href.................................//g' | sed 's/......$//g' | nl -v100 | head 100 435713a.pdf 101 435713b.pdf 102 435714a.pdf 103 435716a.pdf 104 435718a.pdf 105 435718b.pdf 106 435720a.pdf 107 435720b.pdf 108 435723a.pdf 109 435723b.pdf stany@gilva:~/nature/www.nature.com/nature/journal/v435/n7043[06:56 PM]$ cat index.html | grep pdf | sed 's/^.*href.................................//g' | sed 's/......$//g' | nl -v100 | awk '{print "mv pdf/"$2" pdf/"$1".pdf"}' | head mv pdf/435713a.pdf pdf/100.pdf mv pdf/435713b.pdf pdf/101.pdf mv pdf/435714a.pdf pdf/102.pdf mv pdf/435716a.pdf pdf/103.pdf mv pdf/435718a.pdf pdf/104.pdf mv pdf/435718b.pdf pdf/105.pdf mv pdf/435720a.pdf pdf/106.pdf mv pdf/435720b.pdf pdf/107.pdf mv pdf/435723a.pdf pdf/108.pdf mv pdf/435723b.pdf pdf/109.pdf stany@gilva:~/nature/www.nature.com/nature/journal/v435/n7043[06:57 PM]$
You get the idea.
Then it’s just drag and drop.
I’ve still not found a free way to delete duplicate pages, however PDFpen looks reasonably good (It has a problem with inability to preview the large page and the thumbnails of the rest of the pages in the file at the same time, and the interface for deleting pages is not obvious, but maybe I should contact the authors). It is 50$ USD for the basic version (And I don’t need form creation either), which is much better then fill Acrobat from Adobe.
I should contact the authors, and see if they will add the features I would like, and if they do, register the software. Hrm….
As my Spanish teacher used to say: necesito ganar dinero.