Already Seen? or a rant about emerging file types

Already Seen?

Yesterday I’ve encountered a rather interesting problem. I were visiting Alex, and in conversation he complained about a new phenomena in book and magazine distribution – LizardTech djvu file format . He downloaded an issue of “Penthouse”, and wanted to extract a particular page out of it, and print it onto a sticker that he could apply to face of burned CD. However the file he had was in djvu format, and the stand-alone viewer he was using didn’t let him print. I agreed to take a look at the problem for him, and whipped out my trusty iBook….

In any case, Alex was viewing djvu files using OpenDjVu viewer for Windows. While that viewer is a stand-alone (and one can wonder about the purpose of “easy navigation with one hand from keyboard” considering that magazines like “Penthouse” and “Playboy” are distributed in this format). one of the shortcomings that it has is lack of print support. And ofcourse one can’t export the djvu file as anything.

After looking at LizardTech web site, I’ve noticed that the only Mac OS X implementation that they support is a Safari Plugin (For the reference it installs into /Library/Internet Plug-Ins

stany@fiona:/Library/Internet Plug-Ins[01:53 PM]$ ls -lad NPD*
-rwxr-xr-x  1 stany  admin  8966  2 Nov 01:51 NPDJVU
drwxr-xr-x  3 stany  admin   102  2 Nov 01:51 NPDjVu.plugin
stany@fiona:/Library/Internet Plug-Ins[01:53 PM]$ 

).

I subscribe to the orignal Mac software distribution model where all the bits and pieces needed by a particular program should be available inside a single directory tree, making upgrading and pruning file system simple (getting rid of a program should be rm -rf /opt/program_name or rm -rf /Applications/Program Name.app. This method was field tested over and over again on literally tens of Solaris systems I’ve administred, allowing me to compile once, and copy over everywhere, and while somewhat wasteful on disk space, seems to be rather effective), so I were reluctant to install the plug-in.

No problem, there is a djvulibre implementation that is designed to run under X. Downloaded that, looked through INSTALL file. It depends on qt-X11 and libtiff. Compiled libtiff. Remembered that there is such a thing now as qt-mac, and downloaded that. Attempted to follow the installation instructions for qt-mac (uncompress into /Developer/, rename qt-mac-3.x.x.x to /Developer/qt/, run configure, followed by make). Took over an hour and generated tons of object files, with no end in sight. Realized that I don’t really need the fancy graphic utilities that are part of djvu-libre, and all I need is djvups. Compiled that.

Run it on a 5.7 meg .djvu file, redirecting the output into new file, that is supposed to be PostScript. End result have been running for 15 minutes and over a gigabyte, until it run out of disk space. Cursed a lot. Copied the .djvu file to a Solaris system with much more RAm and disk space, and let it run there for a while. Half an hour later I had a 1.6 gigabyte PS file. Fed the resulting PS file into Adobe Distiller 4.0 that came with Adobe Frame Maker 6.0. It run for over 2 hours on a 400 Mhz USII CPU, and used up over 450 megabytes of real RAM. End result was 7.5 megabyte PDF file.

I’ve opened the resulting file, and attempted to zoom onto some text, only to find out that the result is not readable. At this point I suspect that the fault is with Adobe Distiller, and that I should have specified that I want print resolution, and not default 72dpi.

Example of pixilization in 7 meg PDF output at 200% magnification

This is what it looked like at 200% magnification

By this point it’s 2 in the morning, and me and Alex are both cursing a lot.

So I eventually gave in and installed the Safari plugin and restarted Safari. In README for it, there is a little note that says that viewing local djvu files is not supported. No problems, copy the files into Apache directory, access it over HTTP. No workie, Safari is showing a bunch of binary garbage. More cursing. At this point I go into Help-> Installed Plug-ins, and notice that while the djvu plugin is installed and enabled, it is set as a handler for only a number of particular mime types (Subject of another RANT – Who still uses MIME types that are essentially extension based instead of using something similar to file(1) database to identify what kind of file one is handling regardless of extension, resource fork, etc? Get with the program, folks). So a quick edit of httpd.conf to add AddType image/x-dejavu .djvu, and a quick apacehctl restart later, the plugin was displaying djvu files.

At this point printing was working. So I just printed the thing to PDF using normal Panther PDF support. Another 10 minutes later, I had a 215 meg file. Opening it, and zooming in was actually showing the text in a readable way, so I started up Windows File Sharing, and copied it over to his system.

Example of pixilization in 215 meg PDF output at 200% magnification

This is what a 215 meg PDF output generted from Panther looked like at 200% magnification

So good 4 hours later Alex had his 5.7 meg djvu file as a 215 meg PDF file that he could actually use.

Here are the file sizes:

-rwxr--r--   1 stany    staff    5616972 Nov  2 00:37 penthouse_11.djvu
Original
-rw-r--r--   1 stany    staff    1608552487 Nov  2 00:59 penthouse_11.ps 
Generated with djvups with above as input
-rw-r--r--   1 stany    staff    7686172 Nov  2 03:11 penthouse_11.pdf
Generated by Adobe Distiller
-r-xr-xr-x  1 root      wheel  221127535  1 Nov 21:22 penthouse_11.pdf
Output of Safari djvu viewer printed to PDF

So here is the rant part:

If you create a new file format, no matter how cool you think it is, and how new and advanced it is, please consider that there might be people who would want to actually do things to data in the file that you never anticipated. So even if you are 100% sure that your file format is the best thing since chocolate and everyone on the planet should adopt it, please keep in mind that there might be some specialized applications written to solve a particular problem, and that those applications can do things that you never anticipated, and support file formats that are existing standards, and not emergent standards. Thus, if you want your file format to be successful, do take care in writing and making publically available filters that not only convert to your file format, but from it as well. This way people will not perceive that you are attempting to lock them into your design, but instead that your idea has genuine merit, and technological innovation.

It seems that djvu has some benefits, but primarily folks prefer it for distribution of images as it generates smaller files then .PDF. However that is not exactly true. After some research on the internet I believe that djvu generates smaller files only if the source material is a series of images. The moment you run any sort of OCR on the images, and distill that, you will get a significantly smaller PDF file.

So djvu is a file format for folks who either need to distribute high quality image galleries as single files, or who do not have access to OCR software (or do not want to proofread). It is also not searchable or indexable.

Obviously neither the folks at LizardTech nor numerous open and free and libre implementators of the djvu support or even think about such a novel and advanced concept as export to JPG page by page. The resolution that is generated by djvu is obviously high enough that the generated images can be successfully OCRed using commercial off the shelf software. However right now in order to do something like that, one has to first dump djvu to PDF and then convert PDF to jpg page by page, with introduction of noise and compression artifacts at the extra step. Lack of generally available tools to modify the files seems to also hinder djvu adoption as a mainstream file format.

It is well known that VHS adoption was widely won due to efforts of pornographers, who chose simpler if technologically inferior VHS over Beta. It seems that LizadTech is betting on the similar vector, as currently it is the warez community that seems to utilize djvu file format, due to it’s smaller size, and thus lighter load on their servers. While I am not 100% certain that this is an ideal business plan, there of course might be some merit to it.

To quote MasaManiA.com, “You must agree my logical thinking and natural fear” (Note: Potentially NSFW but generally hilarious never the less).

rentzsch.com: fs_usage Intro

rentzsch.com: fs_usage Intro

[excerpts from his blog entry follow]
fs_usage is a command line tool that displays file system activity.

That’s a little better, but still a firehose. You can cut it down by grepping out the CACHE_HIT lines and grep’s own reads:

$ sudo fs_usage -e -f filesystem|grep -v CACHE_HIT|grep -v grep

Now you have a solid base. It’s still a lot of information delivered pretty quickly, but now it’s realistic to — say — start the recording, do your thing, stop the recording and comb through the resulting logs.

Of course, you can further focus the output. For example, discover what files are being opened, as they’re being opened:

$ sudo fs_usage -e -f filesystem|grep -v CACHE_HIT|grep -v grep|grep open

Or, who’s writing to your disk:

$ sudo fs_usage -e -f filesystem|grep -v CACHE_HIT|grep -v grep|grep write

MacInTouch – Comments about FireWire/USB Enclosures

Yesterday there was a post on Macintouch asking for experiences with Oxford and Prolific chipsets in Firewire/USB enclosures;

[Ed Fortmiller] I need to purchase a drive enclosure that supports both FireWire (1394) and USB 2.0. For FireWire, some enclosures use the Oxford 911 chip whereas others use the Prolific PL-3507 chip.
  Is one chip preferable over the other? Are there known problems with either of these chips? Suggestions for a good reliable (dual FW/USB) drive enclosure?

Here is my response, as well as several other’s comments;

A number of people responded to yesterday’s query about FireWire/USB drive enclosures:
[James Ehrler] I purchased a Plumax enclosure from Dealsonic about 4 months ago that had an Oxford chip and Firewire/USB. Works great.
  Needed another so I purchased the same case (same part number and also from Dealsonic) but it had the Prolific chip. Didn’t work for beans.
  I had to get an RMA and swap it for an Oxford-based case (FireWire only) from Dealsonic.

[Jason Froikin] Any time you need a hard drive enclosure that supports both FireWire and USB, the best source I know of is Firewire Depot. Most of their enclosures use Oxford chipsets. I’m not sure if it’s superior, but it does have native Mac OS X support dating back to the public beta.

[David Rostenne] I do Mac consulting, and several of my clients have recently needed Firewire/USB enclosures. We originally got some with Prolific PL-3507 chipsets and had a lot of problems with them. When we switched the same drives over to enclosures with Oxford chipsets the problems went away. Make sure to upgrade the firmware of the Oxford chipsets before using the enclosures, as there are several bug fixes..
  Also make sure to read the specs carefully as we also have run across enclosures that have only a single firewire port, and not enough airflow to keep the drives cool. For 2.5 inch enclosures make sure that both sets of ports can be bus-powered, and use normal sized connectors, instead of the mini-plugs.. less cables to carry around!
  Currently we are recommending the Macally enclosures, for 3.5 and 2.5 inch drives. We have also had no problems with the Maxtor Onetouch series, they come with drives and are firewire/USB and also come with a copy of Retrospect.

[Paul Kneipp] Regarding Ed Fortmiller’s request for USB/1394 enclosures: I can highly recommend anything made by MacPower. I have owned a lot of these type of devices, but I am completely in love with my Clearlight 2.5″ model. Elegant, simple design and very strong. MacPower have won a heap of awards for their products. Better to pay a little more for a good housing – it’s worth it, especially on the day it accidentally slides out of your briefcase as you open the car door . . .

[Richard Barrett] I just bought a couple of drive cases and here is my view: Drive cases are pretty much the equal. They use different chips and most work well.
Just about anything with the Oxford 911 works great at Firewire 400. Usually no USB.
The Oxford chip 922 gives you USB 2.0 also.
The Prolific chips work well. But, frequently have a single input so you can’t daisy-chain drives.
Make sure you check for large drive support or the case won’t work with drives bigger than 120 GB
A case with a quiet fan can extend your drive life.
Case stories (with my Dual G5):

Generic Firewire 400 two ports Oxford 911 chip, no USB ($40). Works great. No problems. [Plumax PM-350F2-POS]
SanMax Firewire 400 / USB 2.0 single port prolific Chip set ($70). One little problem with two drives with the prolific chipset on two firewire ports … only one drive appears. One Oxford and one Prolific work fine, too. One on Firewire and one on USB 2.0 works fine. Nice Mobile Disk small case with external power supply and fan.
ADS Firewire 800 / USB 2.0 ($109). Chipset reports as “ADS Tech.” When I installed a new 250 GB Maxtor drive in the case, I couldn’t initialize the drive for the Mac with Apple’s disk utility or the software provided by ADS. I put the drive in a SanMax case, initialized it, put it back in the ADS case and it has worked flawlessly since then. FW800 is very fast. Measured throughput is about twice the FW400 for about $30 or $40 more. Internal power supply.
ADS Firewire 400, no USB Oxford 911 chipset ($70). I installed a CD recorder and used PatchBurn. Works great.
If you have a computer that can use it, get Firewire 800.

Macintouch does not use permanent links, so this url was only valid for that day… MacInTouch Homepage 10.20.2004

Disabling and Editing OS X’s Built in Firewall

Without using ipfw. I wanted to have Personal Web Sharing on, but block access to it on the firewall. When I tried to change it in the Firewall tab in System Preferences I got a sheet saying, “You cannot change the firewall settings for this service.” So I went off and found that the settings are stored in
/Library/Preferences/com.apple.sharing.firewall.plist

Once I opened it I discovered I could change the ‘editable’ value:

<key>Personal Web Sharing</key>
<dict>
<key>editable</key>
<integer>0</integer>
<key>enable</key>
<integer>1</integer>
<key>port</key>
<array>
<string>80</string>

I set it to 1, saved it, and re-opened System Preferences Firewall Tab. I could now disable access to port 80 through the firewall, while leaving me free to continue fiddling with my internal website.