Minor CSS changes for code blocks

I made some minor changes to the default stylesheet to have blocks of code no longer overflow the edge of the page.

Changes were based on the stylesheet I found at
http://www.bigbold.com/snippets/stylesheets/posts.css.

I added in the following:

pre {
        border: 1px solid #333;
        padding: 6px;
        overflow: auto;
        margin: 0px 1%;
        width: 95%;
        }

Thanks to the Snippets public code repository.

Changes and suggestions welcome!

The Case for PowerPC based Amiga Hardware

My friend Michael got his article published …

His perspective on why the new Amiga hardware is based on the PowerPC chips instead of Intel.

“Synopsis:
The economic well being of the Amiga market demands a system that is not in direct competition with the Wintel world. Running on Intel hardware will jeopardise the viability of OS4 by placing it into direct competition with Windows. Intel derived hardware raises issues of support costs and financial return for the vendor.”

The Case for PowerPC

This message will self-destruct in five….

Many years ago (well, 1998, to be exact) when the science of computer forensics was basically in it’s infancy, there already were rumors of various TLAs being able to read data that was overwritten on the hard drive. Some folks were telling of tunneling microscopes, and other high tech gizmos that could recover data up to 7 overwritions ago. I don’t know how true it is, however am inclined to believe that it is true. Now a days, products such as Encase can do wonders, and Linux based and occasionally open source tools are close behind commercial vendors. Forensics field turned into science, with an entire industry to support it – hardware write blockers, special court proceedings, expert witnesses, data recovery software, etc. One of these days…

This time I’ll just address a simple question: What media should one store the data on, if one expects that one would need to destroy the data on the media some time in the future, and adversary with great financial and technical resources would be interested in reconstructing the data?

For something like this, I’d recommend rewritable CDs and DVDs. Primary reason is ease of data disposal – if one has 30 seconds to get rid of incriminating evidence, all one really has to do is to drop a styrofoam cup half-full of water with a CD on it into microwave, and tell it to reheat.

While I don’t know about recent models of microwaves, older systems would generate a satisfying arcing and media destruction in about 10 seconds. Media will not melt to slug, however the data layer would be covered by a spider web of cracks, that would pretty much be rendered unusable. Here is an example. A google search for “microwave CD” should provide plenty more links to images.

Now, to a physicist in me this looks like a rather complete way of getting rid of unwanted data irrecoverably.

However, this is a reason why I suggested CD-RW and DVD-RW in the first place:

As with CD-Rs, the read laser does not have enough power to change the state of the material in the recording layer — it’s a lot weaker than the write laser. The erase laser falls somewhere in between: While it isn’t strong enough to melt the material, it does have the necessary intensity to heat the material to the crystallization point. By holding the material at this temperature, the erase laser restores the compound to its crystalline state, effectively erasing the encoded 0. This clears the disc so new data can be encoded.

So my advice to dissidents world-wide – first erase the DVD-RW and CD-RWs, and then microwave them. After that, toss them out and don’t toss and turn while in bed 😛

Edit1: Note! When I talk about erasing CD-R or DVD-R, I mean about full erase, that takes ~15 minutes, NOT quick erase. Quick erase generally just zeros out first megabyte of data on disk, including TOC, so it seems like it’s a clean disk, yet all of the previously recorded data is still there!

Edit2: I wonder if the crystalline properties of the layer change from being melted and re-cooled during the erasure process. In other words, is it still possible to detect where data was based on the different structure of the “re-flowed” layer after erasure? Any material scientists around? 😛

That’s why I recommend microwaving of the disk, just to make reconstruction of the data just that much harder.

“Image unavailable for copyright reasons”

Background

For the last few years I’ve been subscribing to Nature Publishing Group’s Nature magazine. It is a scientific, peer-reviewed weekly magazine that publishes some of the best and ground breaking papers in hard sciences (And when I say “science”, I don’t mean “English Literature”) – biology, chemistry, material sciences, physics, occasional psychology and mathematics.

It is a great magazine, and probably the only real competition it has in the field is Science, published by AAAS (American Association for the Advancement of Science). AAAS’s Science is also offered as a digital download (and in fact they offer subscription with a significant discount), however it only offers it in a Zinio format. Zinio, on the other hand, is infamous for having DRM in the downloaded files, giving original content providers ability to expire the document after certain time. So in reality I don’t own content, like I do when I buy a magazine, but rent it, and they can pull the plug at any moment.

As an aside, I am seeing that at least some flavors of PDF can be configured not to open after a certain date, however I wonder how widespread the practice is, and how “portable”such documents are, as, presumably, one would need an extension to open such a PDF. Maybe Zinio is in fact an overgrown extension to PDF documents, with a fancy page turning animation? Here is a thought. Link to the PPT presentation, on page 10 says regarding benefits of PDF: Security. Allows multiple security settings from fully editable to print only access. Files can be set to expire (cannot open past expire date).

One of the benefits of a personal subscription to Nature is ability to download any article that they publish as a PDF file once you authenticate and get a cookie loaded into the browser (If you are a student, you can probably see if your university has a site license for Nature and Science, and if you can access their web sites through your university’s web proxy). As Nature is a weekly publication, 52 issues, each 7 – 8 mm thick, start eating up space on the bookshelf rather rapidly, and weigh a good deal too. As a result, back, when I had more time on my hands, I were actually downloading all the PDF pieces for each issue I’d recieve, glue them together into a single “issue” about 10 megs big, and drop it as a single file on my hard drive. Eventually I’ve stopped doing it, because it was taking a fair bit of my time, and attempts to convince my little brother to do it for me (I were offering up to 2$ per issue) were not getting anywhere.

I still upkeep my subscription, and occasionally do have time to read both Nature Science Update online and leaf through the actual magazine.

Image unavailable for copyright reasons

So I went and looked at the PDFs on Nature site today (first time in a few months), and I’ve noticed something that is new to me. On the PDF versions of some pages, more noticably on the feature articles (which are 2 – 3 page articles that describe in depth a particular aspect of science), some images have been replaced with boxes saying “IMAGE UNAVAILABLE FOR COPYRIGHT REASONS”.

Before, the mail difference between the PDF and printed version was lack of advertisment (well, most of the time, sometimes they would goof, and you’d get to see a half page ad somewhere), as NPG was a firm believer that in order to advertise in PDF version advertiser should pay. Occasionally Nature would retract an article, and then the PDF to it would be removed, and on occasion other pages, where article began and ended, censored too.

Now some images are missing. Here is an example. (480K, sorry, I didn’t feel like cropping it).

Note that this is new – I went and doublechecked archives, and older stories don’t have this “feature”. Of course this can change – maybe they didn’t yet have enough man power to go and look through the back archives.

Why? Why would they do something like that?

Maybe they got nailed by their stock image library. Maybe some photographer took them to court. It’s the larger images that seem to be unavailable, so maybe some kid took the image from some Nature issue, and used it in a high school project.

This kind of crap upsets me.

Spotlight and PDFs

Now, while talking about Spotlight, I started thinking about possibilities…. Indexing entire hard drive is evil (Or, in my case eats too many CPU/IO cycles), so for now I’ve disabled Spotlight in my /etc/hostconfig. However, if I were to create a single subdirectory (Or Partition, or, heck, external drive) for documents, turn off indexing for boot volume, turn off indexing for this new volume/subdirectory/partition (from now on: PDF repository), then, once I’ve added/modified the PDF content sufficiently, tell it to index the contents once (but not continuously) using mdimport(1), I’ll get all the benefits of Spotlight/mdfind(1) with none of the slowdowns for the documents in repository (That I presumably want indexed). So I could have the cake and eat it too.

So I started looking at various options. Dave’s pointer about using wget with cookies seems like the right step forward. I’ll have to make sure that I tell wget to send the same User-Agent string as my browser does, as I recall from older days that folks at Nature actually keep track of that.

I’ve not tried it yet, but I’ll be seeing what I can do.

So here are questions that I don’t know answers for:

What is the easy way to join a bunch of PDFs into a single PDF? Bonus points if it’s something that can be done from command line, maybe as a batch.

Also, is there an easy way to screen out duplicate pages in a PDF, preferably not involving human iunteraction? Under Acrobat 4 (or 3) it was rather simple – I’d just generate previews for each page, and then click on whichever look similar, and kill them, but that requred at least glancing through the PDF document.

Zinio reader uses it’s own format, that is heavily DRMed, and, as far as I can tell, might actually be based on PDF, as they licensed the PDF library from Adobe. So, question I have is: Is there a tool that will strip the DRM, and generate a normal PDF out of Zinio file? Best solution so far that I found was to print each page to PDF individually, and then basically merge pieces together, but this is ugly as sin.

Lastly Any better suggestions on how to deal with spidering sites like Nature’s, and pulling down certain types of content? Maybe I shouldn’t try to re-invent the wheel.

As usual, feel free to post comments (all 3 people and 100 comment spam bots that occasionally look at the site) 🙂

The Tipping Point

Apple announced that within the next 2 years they will have completed the transition from using the PowerPC to machines based on Intel’s CPU’s.

I never thought i would see the day… now I am extremely curious as to what the changes *inside* the box are going to be.

“The soul of a Mac is its operating system.” Steve Jobs, WWDC, June 6th 2005