Merging a bunch of PDFs together

A couple of days ago one of the questions I asked was for an easy (and preferably command line scriptable) way to merge a bunch of PDF files together. Well, I think I found a way.

MonkeyBread Software makes RealBasic plugins and extensions. I’ll be the first one to say that I don’t know jack about RealBasic, however one of the freely downloadable tools that they provice is Combine PDFs (They even include RealBasic source). It’s a tiny carbon app, that basically does what I want it to do.

It has an interesting “feature” – it seems to get rid of the “Image Unavailable for Copyright Reasons” watermark when dealing with PDF files generated by NPG. So I just get white blocks with occasional capture under the text. But hey, it’s free, so who am I to complain?

One of the tricks I use while using Merge PDFs is to rename a bunch of PDFs into numerically ordered list, something like:

$ grep pdf index.html| sed regular expression or three go here to result in file list 
 | nl -v100 | awk '{print "mv "$2" "$1".pdf"}' | sh

where I basically use nl(1) to start labeling the lines with 100 and counting onwards.

Then inside Combine PDFs I can just tell it to order files in alphabetical order, and off I go.

Here is what a real run would look like:

stany@gilva:~/nature/www.nature.com/nature/journal/v435/n7043[06:56 PM]$ 
cat index.html | grep  pdf | sed 's/^.*href.................................//g' | 
sed 's/......$//g' | nl -v100  | head
   100  435713a.pdf
   101  435713b.pdf
   102  435714a.pdf
   103  435716a.pdf
   104  435718a.pdf
   105  435718b.pdf
   106  435720a.pdf
   107  435720b.pdf
   108  435723a.pdf
   109  435723b.pdf
stany@gilva:~/nature/www.nature.com/nature/journal/v435/n7043[06:56 PM]$ 
cat index.html | grep  pdf |  sed 's/^.*href.................................//g' | 
sed 's/......$//g' | nl -v100 | awk '{print "mv pdf/"$2" pdf/"$1".pdf"}' | head
mv pdf/435713a.pdf pdf/100.pdf
mv pdf/435713b.pdf pdf/101.pdf
mv pdf/435714a.pdf pdf/102.pdf
mv pdf/435716a.pdf pdf/103.pdf
mv pdf/435718a.pdf pdf/104.pdf
mv pdf/435718b.pdf pdf/105.pdf
mv pdf/435720a.pdf pdf/106.pdf
mv pdf/435720b.pdf pdf/107.pdf
mv pdf/435723a.pdf pdf/108.pdf
mv pdf/435723b.pdf pdf/109.pdf
stany@gilva:~/nature/www.nature.com/nature/journal/v435/n7043[06:57 PM]$ 

You get the idea.

Then it’s just drag and drop.

I’ve still not found a free way to delete duplicate pages, however PDFpen looks reasonably good (It has a problem with inability to preview the large page and the thumbnails of the rest of the pages in the file at the same time, and the interface for deleting pages is not obvious, but maybe I should contact the authors). It is 50$ USD for the basic version (And I don’t need form creation either), which is much better then fill Acrobat from Adobe.

I should contact the authors, and see if they will add the features I would like, and if they do, register the software. Hrm….

As my Spanish teacher used to say: necesito ganar dinero.

“Image unavailable for copyright reasons”

Background

For the last few years I’ve been subscribing to Nature Publishing Group’s Nature magazine. It is a scientific, peer-reviewed weekly magazine that publishes some of the best and ground breaking papers in hard sciences (And when I say “science”, I don’t mean “English Literature”) – biology, chemistry, material sciences, physics, occasional psychology and mathematics.

It is a great magazine, and probably the only real competition it has in the field is Science, published by AAAS (American Association for the Advancement of Science). AAAS’s Science is also offered as a digital download (and in fact they offer subscription with a significant discount), however it only offers it in a Zinio format. Zinio, on the other hand, is infamous for having DRM in the downloaded files, giving original content providers ability to expire the document after certain time. So in reality I don’t own content, like I do when I buy a magazine, but rent it, and they can pull the plug at any moment.

As an aside, I am seeing that at least some flavors of PDF can be configured not to open after a certain date, however I wonder how widespread the practice is, and how “portable”such documents are, as, presumably, one would need an extension to open such a PDF. Maybe Zinio is in fact an overgrown extension to PDF documents, with a fancy page turning animation? Here is a thought. Link to the PPT presentation, on page 10 says regarding benefits of PDF: Security. Allows multiple security settings from fully editable to print only access. Files can be set to expire (cannot open past expire date).

One of the benefits of a personal subscription to Nature is ability to download any article that they publish as a PDF file once you authenticate and get a cookie loaded into the browser (If you are a student, you can probably see if your university has a site license for Nature and Science, and if you can access their web sites through your university’s web proxy). As Nature is a weekly publication, 52 issues, each 7 – 8 mm thick, start eating up space on the bookshelf rather rapidly, and weigh a good deal too. As a result, back, when I had more time on my hands, I were actually downloading all the PDF pieces for each issue I’d recieve, glue them together into a single “issue” about 10 megs big, and drop it as a single file on my hard drive. Eventually I’ve stopped doing it, because it was taking a fair bit of my time, and attempts to convince my little brother to do it for me (I were offering up to 2$ per issue) were not getting anywhere.

I still upkeep my subscription, and occasionally do have time to read both Nature Science Update online and leaf through the actual magazine.

Image unavailable for copyright reasons

So I went and looked at the PDFs on Nature site today (first time in a few months), and I’ve noticed something that is new to me. On the PDF versions of some pages, more noticably on the feature articles (which are 2 – 3 page articles that describe in depth a particular aspect of science), some images have been replaced with boxes saying “IMAGE UNAVAILABLE FOR COPYRIGHT REASONS”.

Before, the mail difference between the PDF and printed version was lack of advertisment (well, most of the time, sometimes they would goof, and you’d get to see a half page ad somewhere), as NPG was a firm believer that in order to advertise in PDF version advertiser should pay. Occasionally Nature would retract an article, and then the PDF to it would be removed, and on occasion other pages, where article began and ended, censored too.

Now some images are missing. Here is an example. (480K, sorry, I didn’t feel like cropping it).

Note that this is new – I went and doublechecked archives, and older stories don’t have this “feature”. Of course this can change – maybe they didn’t yet have enough man power to go and look through the back archives.

Why? Why would they do something like that?

Maybe they got nailed by their stock image library. Maybe some photographer took them to court. It’s the larger images that seem to be unavailable, so maybe some kid took the image from some Nature issue, and used it in a high school project.

This kind of crap upsets me.

Spotlight and PDFs

Now, while talking about Spotlight, I started thinking about possibilities…. Indexing entire hard drive is evil (Or, in my case eats too many CPU/IO cycles), so for now I’ve disabled Spotlight in my /etc/hostconfig. However, if I were to create a single subdirectory (Or Partition, or, heck, external drive) for documents, turn off indexing for boot volume, turn off indexing for this new volume/subdirectory/partition (from now on: PDF repository), then, once I’ve added/modified the PDF content sufficiently, tell it to index the contents once (but not continuously) using mdimport(1), I’ll get all the benefits of Spotlight/mdfind(1) with none of the slowdowns for the documents in repository (That I presumably want indexed). So I could have the cake and eat it too.

So I started looking at various options. Dave’s pointer about using wget with cookies seems like the right step forward. I’ll have to make sure that I tell wget to send the same User-Agent string as my browser does, as I recall from older days that folks at Nature actually keep track of that.

I’ve not tried it yet, but I’ll be seeing what I can do.

So here are questions that I don’t know answers for:

What is the easy way to join a bunch of PDFs into a single PDF? Bonus points if it’s something that can be done from command line, maybe as a batch.

Also, is there an easy way to screen out duplicate pages in a PDF, preferably not involving human iunteraction? Under Acrobat 4 (or 3) it was rather simple – I’d just generate previews for each page, and then click on whichever look similar, and kill them, but that requred at least glancing through the PDF document.

Zinio reader uses it’s own format, that is heavily DRMed, and, as far as I can tell, might actually be based on PDF, as they licensed the PDF library from Adobe. So, question I have is: Is there a tool that will strip the DRM, and generate a normal PDF out of Zinio file? Best solution so far that I found was to print each page to PDF individually, and then basically merge pieces together, but this is ugly as sin.

Lastly Any better suggestions on how to deal with spidering sites like Nature’s, and pulling down certain types of content? Maybe I shouldn’t try to re-invent the wheel.

As usual, feel free to post comments (all 3 people and 100 comment spam bots that occasionally look at the site) 🙂

Dual Layer DVD burners in PowerMac G5s

Andy called my “employer” today, and asked us to find out for him what dual layer burners are in PM G5s. So of course the question percolated down to me, without the associated name attached to the question.

Apple ships different burners in different batches of systems, depending on which manufacturer gives Apple a better deal. So new PM G5s can come with either SONY DW-Q28A or Pioneer DVR-A09 (Which is just an Apple branded version of Pioneer DVR-109, and has no functional or firmware differences).

While I can understand why someone might want an Apple Shipped/Apple Supported DVD burner, the benefts of such support are in reality rather slim. Apple will support CD burning on either Apple Shipped or Unsupported DVD burner, as licensing is limited to DVD support. Ditto with booting (Booting is actually something that starts regardless of the OS, as it’s triggered by OpenFirmware. Thus as long as device supports standard ATAPI command set, it can be used for booting). So in reality all one loses is lack of DVD burning from Disk Utility, iTunes and things like iDVD.

What I recommend is buying whatever is the cheapest dual layer burner you can find that has patched firmware available from download from rpc1.org, and then using Patchburn to install a profile, turning the device into “Vendor Supported”, and reenabling burning from iTunes, Disk Utility and iDVD. That coupled with RPC1 firmware and ripping lock removal (That removes the restriction built into most new DVD drives to slow down reading of disks to 2x if a directory VIDEO_TS is detected on disk) makes the drive into a rather useful piece of equipment that OWNER controls.

So you might think that something free, like Patchburn would be slow to release updates for Tiger. You’d be wrong, however, as support for Tiger existed on the day Tiger was released. We will of course see what happens when Leopard comes out.

Patchburn might sound like an inconvinience. One has to go to a germanweb site, download software, click… So let me ask you a question: how often do you burn DVDs using Apple Disk Utility, while waiting for it to create 8.5 gig dmg file? Right. You burn your DVDs using Roxio Toast, aren’t you? And your Roxio Toast supports “Unsupported” drives as well as it does “Apple Shipped”, right? So I don’t see a problem, but please leave a comment and let me know if you don’t agree.

Here is some basic economics: I bought an LG HL-DT-ST GSA-4160b dual layer DVD burner at Best Buy on boxing day 2004 for 120 CAD, with 40 mail in rebate (that I recieved). So in reality after taxes I spent 98 CAD on it. At that time a Pioneer DVR-A09 was selling for 150-170 CAD plus taxes. On the saved money I bought an external enclosure for it, making it mobile.

Don’t get me wrong, Pioneer DVR-109 is a great drive, and I see that Compunation is listing it for just a shade over 100CAD at the time of this writing, but then again, LG burners are ~65CAD now too. Lasers in CD/DVD burners burn out after about the same number of writes, so is paying 40$ extra worth it?

Lastly, I have a DVR-105 at work. I’ve upgraded it to the latest firmware, and tried burning with it. It chokes on cheap silver only DVD-R media (No idea what kind, probably rebranded ritek, or something equally cheap), creating corrupted burns in all tries (I learned the lesson after 3rd attempt to burn). A cheap LG and BenQ burners I have here don’t have an issue with media at all, writing on it at 8x, and passing all the verifications afterwards (Generally it’s a good idea to do verification, just to prevent frustration later). So go figure, cheaper drive reliably burns on cheap media too, so you don’t need to buy expensive Apple branded blanks. I wonder…..

BTW, I am still wondering how to turn MATSHITA CD-RW CW-8123 (Combo drive that shipped with iBook G4) into a region-free drive – I don’t believe that firmware updates for it exist.

OpenSolaris: Sun releases Solaris 10 source

Sun released Solaris source code as part of their OpenSolaris initiative today.

Seems like some things are still binary only (Although fewer then last time Sun showed outsiders their source code, back with Solaris 8), and I didn’t notice the X drivers, but with the source for basic OS (which is what Sun made available), gcc, OpenMotif and X.org‘s drivers it’s probably possible to roll your own Solaris, and the only bit that will be missing with be CDE (Ok, OpenLook would be missing too. But is there anyone out there who actually likes OpenLook, especially since it was depreciated starting with Solaris 9?). Oh, and Display PostScript extension for X would be missing.

*sigh* As weird as it sounds, I miss CDE.

P.S. A mirror of the source code is at: http://www.genunix.org/mirror/index.html plus torrents are available at http://dlc.sun.com/torrents/

Tiger: Disabling dashboard

Adam e-mailed me this, so I am preserving it here for posterity.

Since I've not actually found a use for Dashboard:

$ defaults write com.apple.dashboard mcx-disabled -boolean YES

You need to restart the Dock.app (I just killed the process and it came right back.)

Once this is done, you can poof the dashboard app off your dock, as it now does nothing.

Note that this is per user setting, however I am happy, as Dashboard widgets wanted 35 or so megs of real RAM in default configuration.

As an aside, the only widget I were actually using was the weather, and it was talking to american weather site, that was giving me incorect information most of the time.

Tiger: Disabling Spotlight

Spotlight introduces a fairely large performance hit on to the system, especially if the files you are working with are both large and have the Spotlight plugin, and thus can be indexed. Performance hit might be less noticable on the desktop system with fast drives, however on my laptop with 4200 rpm drive, and constantly dealing with megabytes of source code and compilations spotlight introduced less of a benefit and more of a hindrance.

So, without further ado, in order to disable spotlight, one has to edit /private/etc/hostconfig, find the line that reads SPOTLIGHT=-YES-, change it to SPOTLIGHT=-NO-, and rebooot.

This will prevent MetaData Service, / System / Library / Frameworks / CoreServices.framework / Versions / A / Frameworks / Metadata.framework / Versions / A / Support / mds from starting on boot time.

Note that this will not disable file change notifications in the kernel, as can be checked using Amit Singh’s fslogger. On the same page there is some more in depth information on the kernel notification service that Spotlight (and fslogger) subscribe to.

A perty GUI called Spotless was written by someone, but I am not sure I’d trust a GUI to parse and edit a text file.

If you want to get rid of the looking glass icon in the top right hand corner as well, you might want to either remove (perferably just move out of place) or chmod -R 0000 /System/Library/CoreServices/Search.bundle (Key file. Actual parts of Spotlight are: /Library/Spotlight /System/Library/Spotlight /System/Library/CoreServices/Search.bundle /System/Library/PreferencePanes/Spotlight.prefPane /System/Library/Services/Spotlight.service /System/Library/Contextual Menu Items/SpotlightCM.plugin /System/Library/StartupItems/Metadata plus /usr/bin/md*, although I’d argue that metadata tools in /usr/bin/md* are actually useful.)
Changing permissions means that if at some point you want to undo the changes, you can always repair permissions. In any case, little looking glass in the corner doesn’t bother me much.

Technically one can probably selectively start and stop Spotlight by killing or startng mds and mdimport, however a way Apple recommends is using mdutil -i off / to turn off indexing of the boot volume (ie existing databases would be preserved and accessible through spotlight).

If you ever want to blow away your Spotlight database, and force reindexing (assuming mds/mdimport run), you can do mdutil -i off /, mdutil -E / , mdutil -i on /

Note: Apprently killing spotlight interferes with find in Finder and in Mail.app. As I never use either (locate or find . -name “*foo*” -print on the command line is much more powerful, plus gives me an -exec stuff {} ; option), it doesn’t bother me, however ocdinsomniac has some nice additional information and a script that purports reverting Finder’s find to the Panther style behavior.

Creating dynamic libraries under Mac OS X

So you are compiling some piece of C code, possibly ancient, possibly written for Linux, and you get to the place where a library is about to be created, and you get something like:

stany@gilva:~/src/socks/socks5-v1.0r11/shlib[05:16 AM]$ gcc -o libsocks5_sh.so  
-shared msg.o protocol.o log.o hostname.o confutil.o buffer.o cache.o wrap.o 
wrap_tcp.o wrap_udp.o conf.o libproto.o select.o rld.o null.o addr.o 
upwd.o gss.o   -ldl  
gcc: unrecognized option `-shared'
ld: warning multiple definitions of symbol _gethostbyname2
hostname.o definition of _gethostbyname2 in section (__TEXT,__text)
[....]
ld: Undefined symbols:
_main
stany@gilva:~/src/socks/socks5-v1.0r11/shlib[05:16 AM]$ 

And you get all confuzzled.

Well, unrecognized option `-shared’ warning is not generated by gcc, but by ld, which is the dynamic linker, and is amongst other things in charge of creating dynamic libraries (and static archive files). Of course, dynamic libraries are just collections of functions, and do not need main(). Under Linux, and most other unices (Including Solaris), -shared is what ld wants in order to create a dynamic library. However, Darwin is different, and linker expects -dynamiclib instead.

So:

stany@gilva:~/src/socks/socks5-v1.0r11/shlib[05:16 AM]$ gcc -o libsocks5_sh.so 
-dynamiclib msg.o protocol.o log.o hostname.o confutil.o buffer.o 
cache.o wrap.o wrap_tcp.o wrap_udp.o conf.o libproto.o select.o  rld.o 
null.o addr.o upwd.o gss.o   -ldl  
ld: warning multiple definitions of symbol _gethostbyname2
hostname.o definition of _gethostbyname2 in section (__TEXT,__text)
[...]
stany@gilva:~/src/socks/socks5-v1.0r11/shlib[05:19 AM]$ file libsocks5_sh.so 
libsocks5_sh.so: Mach-O dynamically linked shared library ppc
stany@gilva:~/src/socks/socks5-v1.0r11/shlib[05:20 AM]$ 

However, in spite of it’s magical properties, it doesn’t fix the function name clashes. 🙂

Capturing RTSP streams to file

Introduction

RTSP is Real Time Streaming Protocol that is documented in RFC 2326. Some reasonably good background information is available on the webpage of Henning Schulzrinne. Multicast people also created a specification for a really simple RTSP client with a reasonably good description of a client/server interaction.

My /etc/services lists the following ports as registered with IANA:

stany@gilva:~[01:18 AM]$ grep rtsp /etc/services 
rtsps           322/udp     # RTSPS
rtsps           322/tcp     # RTSPS
rtsp            554/udp     # Real Time Stream Control Protocol
rtsp            554/tcp     # Real Time Stream Control Protocol
rtsp-alt        8554/udp     # RTSP Alternate (see port 554)
rtsp-alt        8554/tcp     # RTSP Alternate (see port 554)
stany@gilva:~[01:18 AM]$ 

RTSP is used by a number of applications, on mbone (where one can watch all sorts of streams, including NASA TV), and on normal IPv4 networks. Vast majority of mbone streams I saw myself encapslate MPEG1 or MP3 data, however MPEG2 support is showing up more and more commonly. People who are not connected to mbone see rtsp URLs as parts of RealAudio/RealVideo streams (Provided, for example, by BBC), and QuickTime streams.

Why would one want to “record” these streams?

There are many cases when this might be desirable – for example one might not have speedy connection to internet, yet want to watch media content provided in high resolution. Occasional high speed connectivity to the internet is a possibility too – for example I can stop by a coffee shop that has wi fi, and have full high speed internet connectivity. So a question arises, how does one capture a stream contents to a file, to watch offline? Sadly, clients that are provided to view/listen to the RTSP streams don’t support capture of data at all (Real Player) or saving of the stream to file requires a “Pro” version of the software and can be disabled by the server (QuickTime Player).

RealMedia Streams

One of the options might be rtspget which is an add-on to xine. I’ve compiled xine under 10.4.1, with just ./configure –prefix=/opt/gnu –with-included-gettext –with-x, however if this is the first time you are building xine, you might need to actually “make install” contrary to instructions on the Ian Collier’s page. In order to buiild, rtspget would want a whole bunch of xine includes that are spread all over, yet get dropped into {PREFIX}/include/xine/ by the install script (In other words you can’t just give -I{path} option to gcc, as the location of includes is different in the tree then in the installed location rtspget expects. And yes, you can edit the source, if you want to). Any way, assumption here is that you know what you are doing, and given some time, and maybe a choice word or three it xine-lib would eventually build, and rtspget would eventually compile.

In my tests, rtspget would be able to negotiate the connection with BBC’s streaming servers, that are Real Media based, and download the streams. Resulting downloads were identified by the file(1) command to indeed be Real media files. No go with the QuickTime streams at all, and the downloaded files were not showing anything in the Real Player 10. On this page (And don’t ask me why engineering at Purdue University insists on HTTPS protocol, as I don’t know either) I saw reference to this, that seemed to indicate the file still plays normaly in xine. So maybe there is something wonky somewhere, and warrants further testing.

Besides this “small” problem with rtspget, a couple of others are present. It sends fixed User-Agent, GUID and StartTime to the servers, and that might cause problems in the future, as servers might be modified to recognize that signature, and deny rtspget access to the media:

stany@gilva:~/src/rtsp[01:59 AM]$ strings rtspget
[...]
rtsp: bad mrl: %s
User-Agent: RealMedia Player Version 6.0.9.1235 (linux-2.0-libc6-i386-gcc2.95)
rtsp: failed to connect to '%s'
CSeq: 1
ClientChallenge: 9e26d33f2984236010ef6253fb1887f7
OPTIONS
PlayerStarttime: [28/03/2003:22:50:23 00:00]
CompanyID: KnKV4M4I/B2FjJ1TToLycw==
GUID: 00000000-0000-0000-0000-000000000000
RegionData: 0
ClientID: Linux_2.4_6.0.9.1235_play32_RN01_EN_586
[...]
stany@gilva:~/src/rtsp[02:00 AM]$

Something to keep in mind. This user agent string is not in the actual rtspget source code but in librtsp in xine source code (from ./xine-lib-1.0.1/src/input/librtsp/rtsp.c):

[...]
  if (user_agent)
    s->user_agent=strdup(user_agent);
  else
    s->user_agent=strdup("User-Agent: RealMedia Player Version 6.0.9.1235 (linux-2.0-libc6-i386-gcc2.95)");
[...]
  s->server_state=RTSP_CONNECTED;

  /* now lets send an options request. */
  rtsp_schedule_field(s, "CSeq: 1");
  rtsp_schedule_field(s, s->user_agent);
  rtsp_schedule_field(s, "ClientChallenge: 9e26d33f2984236010ef6253fb1887f7");
  rtsp_schedule_field(s, "PlayerStarttime: [28/03/2003:22:50:23 00:00]");
  rtsp_schedule_field(s, "CompanyID: KnKV4M4I/B2FjJ1TToLycw==");
  rtsp_schedule_field(s, "GUID: 00000000-0000-0000-0000-000000000000");
  rtsp_schedule_field(s, "RegionData: 0");
  rtsp_schedule_field(s, "ClientID: Linux_2.4_6.0.9.1235_play32_RN01_EN_586");
  /*rtsp_schedule_field(s, "Pragma: initiate-session");*/
  rtsp_request_options(s, NULL);

  return s;
[...]

This leads me to believe that anything linked against xine’s librtsp would have that user agent, etc combination (That someone just captured on the wire while investigating the way RealPlayer establishes a session). Note to myself: Investigate #define RTSP_RECORDING 16 and the rest of server capabilities in xine-lib-1.0.1/src/input/librtsp/rtsp.c)

QuickTime Streams

Another option seems to be vlc. You would want the latest version of it (vlc-0.8.2test2 at the writing time), and you’d want to keep in mind that vlc doesn’t at this time deal with H.264 files properly yet.

Here are the instructions:

  • Locate the stream you want to download. You can generally start watching in QuickTime player, get info, make sure that the stream you are getting is not H.264, but MPEG4, etc, and, if you have a license key for QuickTime Pro version, save the file to disk. If you are lucky, and the content owner didn’t restrict saving, QuickTime Pro would do just that. However, if content owner restricted the saving feature, QT would offer to save the file for you any way. This would be a bit of a misnomer, as that action will generated a small file with an embedded rtsp URL, not actually save the actual file, and you would need internet connection in order watch it again.

    At this point you can use strings on the file to extract the rtsp string. If you don’t have pro license, getting info on the file you are streaming would give you rtsp path as well, however occasionally it seems like it is a path to a container file that contains the actual path, not to the media itself. Warrants further investigation.

  • open VLC
  • File -> Open Network.
  • Select HTTP/FTP/MMS/RTSP
  • Set URL to the rtsp URL you obtained already in the first step
  • Check Advanced output
  • Click Settings
  • Click Browse, choose a name to save the file as
  • Set Encapsulation method to QuickTime. Encapsulation method seems to make a minor difference – when set to quicktime as opposed to the default mpeg_ts, it seemingly started to pixellate less. I don’t know if anything changes in the output of the file.
  • Cleck Play locally if you want to watch while it downloads. Note that MPEG4 files as played in VLC would appear with “blocky” pixellisation. If you manage to save the file, open it in QuickTime, and all that blockiness would go away, as QT does a better job rendering MPEG4
  • Click OK
  • Click OK
  • Wait 🙂

As a test I’ve used a link to Shakira’s full video from Sony’s web site (As Joshie used to say, “Boobies!”).
rtsp://qt.sony-global.speedera.net/qt.sony-global/Epic/Shakira/Shakira_LaTorturaVidFull_300.mov

End result:

stany@gilva:~/tmp[02:19 AM]$ file vlc-output.ts
vlc-output.ts: Apple QuickTime movie file (moov)

Renamed the above file to .mov, opened and played it in QT 7 with no problems, and QT reported the file as mpeg4 video, AAC audio.

Windows Media Files

Lastly, for the sake of completeness, Windows Media files. They don’t use RTSP:// protocol, but something called MMS.
SDP Multimedia has more information and a download client for Windows.
mmsclient deals with files that are not continuous (ie movie clips), however in my tests against the CBC radio streams doesn’t quite work. And yes, I, too, have no idea what “dbl” is, as it’s not a command on either Solaris or Mac OS X. (As an aside, while linking under Solaris, you want to tell gcc to link against libnsl and libsocket: gcc -g -O2 -Wall -o mmsclient client.o -lnsl -lsocket Mac OS X Just Works). CBC radio stream for Ottawa, high quality, is at mms://wm05.nm.cbc.ca/cbcr1-ottawa at the time of this writing. Also note that the version mmsclient broadcast is also hardcoded to Media Player 7 in client.c:

  sprintf (str, "3403NSPlayer/7.0.0.1956; {33715801-BAB3-9D85-24E9-03B903282
70A}; Host: %s",
           host);

This might limit what media it can save, as now a days pretty much everything in Windows world depends on WMP 9.0 and up, as it has DRM support.

MP3/Shoutcast Streams

For that you just want streamripper.

Avenues of further investigation

Darwin Streaming Server, freely available as both source and binary from Apple, has Darwing Streaming Proxy, which, after a fair bit of ignoring wrong documentation, it is possible to get to work. DSS-5.0.3 doesn’t compile out of the box under Tiger, and seems like misses -lcrypt somewhere. However build process uses Xcode (which I adhor), and I have no idea how to convince it to work, so I gave in and downloaded the binary. I am not sure that it’s written in ANSI C, which is also a problem. Idea would be to convince it to fopen() and save the data to the file as it passes through the proxy.

Also, more reading of the simplified RTSP client specification is probably warranted.

stany@iskra:~[03:11am]$  telnet qt.sony-global.speedera.net 554
Trying 209.133.111.201...
Connected to qt.sony-global.speedera.net.
Escape character is '^]'.
DESCRIBE rtsp://qt.sony-global.speedera.net/qt.sony-global/Epic/Shakira/Shakira_LaTorturaVidFull_300.mov RTSP/1.0
Cseq: 1

RTSP/1.0 302 Found
Server: DSS/5.0.3.2 (Build/452.22.3; Platform/Linux; Release/Panther; Update/3GPP; )
Cseq: 1
Location: rtsp://qt.sony-global.speedera.net.central.speedera.net/qt.sony-global/Epic/Shakira/Shakira_LaTorturaVidFull_300.mov

Connection to qt.sony-global.speedera.net closed by foreign host.
stany@iskra:~[03:12am]$ 

Getting Rawr-Endezvous to work with recent Growl framework versions

20050618 Edit: Initially when I wrote this, I were referring to Rawr-Endezvous v0.6b3. In the latest version (0.6b4) the problem I were refering to in this note got fixed. Thank you very much, Jerome. I am impressed that my voice was heard. –stany

I’ve cleanly installed Tiger on my iBook, and then installed latest Growl framework. Sadly rawr-endezvous stopped working. As Adium, etc was happily doing Growl notifications, I figured that the problem is in Rawr-Enedzvous, not in Growl.

It seems that Jeremy Knope basically disappeared off the face of the universe, and didn’t leave forwarding address, so I did a bit of digging, and noticed the following:

stany@gilva:/Applications/extras/Rawr-endezvous.app/Contents/MacOS[06:08 PM]$ otool -L Rawr-endezvous
Rawr-endezvous:
        /System/Library/Frameworks/Cocoa.framework/Versions/A/Cocoa (compatibility version 1.0.0, current version 8.0.0)
        /usr/lib/libobjc.A.dylib (compatibility version 1.0.0, current version 218.0.0) 
        /Users/jerome/Library/Frameworks/GrowlAppBridge.framework/Versions/A/GrowlAppBridge (compatibility version 1.0.0, current version 1.0.0)
        /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 63.0.0)

Next I checked if GrowlAppBridge existed on my system, and noticed that it is now part of Growl.framework, and not part of GrowlAppBridge.framework:

stany@gilva:/Applications/extras/Rawr-endezvous.app/Contents/MacOS[06:10 PM]$ locate GrowlAppBridge
/Library/Frameworks/Growl.framework/GrowlAppBridge
/Library/Frameworks/Growl.framework/Versions/A/GrowlAppBridge
/Library/Frameworks/Growl.framework/Versions/A/Headers/GrowlAppBridge-Carbon.h
/Library/Frameworks/Growl.framework/Versions/A/Headers/GrowlAppBridge.h

So as an interim fix the following works:

stany@gilva:/Library/Frameworks[06:10 PM]$ mkdir -p GrowlAppBridge.framework/Versions/A/
stany@gilva:/Library/Frameworks[06:11 PM]$ cd GrowlAppBridge.framework/Versions/A/
stany@gilva:/Library/Frameworks/GrowlAppBridge.framework/Versions/A[06:11 PM]$ ln -s ../../../Growl.framework/GrowlAppBridge GrowlAppBridge

followed by restarting rawr-endezvous

The Tipping Point

Apple announced that within the next 2 years they will have completed the transition from using the PowerPC to machines based on Intel’s CPU’s.

I never thought i would see the day… now I am extremely curious as to what the changes *inside* the box are going to be.

“The soul of a Mac is its operating system.” Steve Jobs, WWDC, June 6th 2005