ZFS (Part 1)

Over the last year I was getting more and more curious/excited about OpenSolaris. Specifically I got interested in ZFS – Sun’s new filesystem/volume manager.

So I finally got my act together and gave it a whirl.

Test system: Pentium 4, 3.0Ghz in an MSI P4N SLI motherboard. Three ATA Seagate ST3300831A hard drives, one Maxtor 6L300R0 ATA drive (all are nominally 300 gigs – see previous post on slight capacity differences). One Western Digital WDC WD800JD-60LU SATA 80 gig hard drive. Solaris Express Community Release (SXCR) 51.

Originally I started this project running SXCR 41, but back then I only had 3 300 gig drives, and that was interfering with my plans for RAID 5 greatness. In the end the wait was worth it, as ZFS got revved since.

A bit about MSI motherboard. I like it. For a PC system I like it alot. It has two PCI slots, two full length PCI E slots (16x), and one PCIE 1x slot. Technically it supports SLI with two ATI Cross-Fire or Nvidea SLI capable cards, however in that case both full length slots will run at 8x. Single slot will run at 16x. Two dual channel IDE connectors, four SATA connectors, built in high end audio with SPDIF, built in GigE NIC based on Marvell chipset/PHY, serial, parallel, built in IEEE1394 (iLink/Firewire) with 3 ports (one on the back of the board, two more can be brought out). Plenty of USB 2.0 connectors (4 brought out on the back of the board, 6 more can be brought out from conector banks on the motherboard). Overall, pretty shiny.

My setup consists of four IDE hard drives on the IDE bus, and an 80 gig WD on SATA bus for the OS. Motherboard BIOS allowed me to specify that I want to boot from the SATA drive first, so I took advantage of the offer.

Installation of SXCR was from IDE DVD (a pair of hard drives was unplugged for the time).
SXCR recognized pretty much everything in the system, except built in Marvell Gig E nic. Shit happens, I tossed in a PCI 3Com 3c509C NIC that I had kicking around, and restarted. There was a bit of a hold up with SATA drive – Solaris didn’t recognize it, and wanted the geometry, number of heads and number of clusters so that it could create an apropriate volume label. Luckily WD made identical drive but in IDE configuration, for which it actually provided the heads/custers/sectors information, so I plugged those numbers in, and format and fdisk cheered up.

Other then that, normal Solaris install. I did console/text install just because I am alot more familiar with them, however Radeon Sapphire X550 PCIE video card was recognized, and system happily boots into OpenWindows/CDE if you want it to.

So I proceeded to create a ZFS pool.
First thing I wanted to check is how portable ZFS is. Specifically, Sun claims that it’s endinanness neutral (ie I can connect the same drives to the little endian PC, or big endian SPARC system, and as long as both run OS that recognizes ZFS, things will work). I wondered how it deals with device numbers. Traditionally Solaris is very picky about the device IDs, and changing things like controllers or SCSI IDs on a system can be tricky.

Here I wanted to know if I can just create, say, a “travelling zfs pool”, where I’ll have an external enclosure with a few SATA drives, an internal PCI SATA controller card, and if things go wrong in a particular system, I could always unplug the drives, and move them to a different system, and things will work. So I wanted to find out if ZFS can deal with changes in device IDs.

In order for ZFS to work reliably, it has to use a whole drive. It, in turn, writes an EFI disk label on the drive, with a unique identifier. Note that certain PC motherboards choke on EFI disk labels, and refuse to boot. Luckily most of the time this is fixable using a BIOS update.

root@dara:/[03:00 AM]# uname -a
SunOS dara.NotBSD.org 5.11 snv_51 i86pc i386 i86pc
root@dara:/[03:00 AM]# zpool create raid1 raidz c0d0 c0d1 c1d0 c1d1
root@dara:/[03:01 AM]# zpool status
  pool: raid1
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        raid1       ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c0d0    ONLINE       0     0     0
            c0d1    ONLINE       0     0     0
            c1d0    ONLINE       0     0     0
            c1d1    ONLINE       0     0     0

errors: No known data errors
root@dara:/[03:02 AM]# zpool list
NAME                    SIZE    USED   AVAIL    CAP  HEALTH     ALTROOT
raid1                  1.09T    238K   1.09T     0%  ONLINE     -
root@dara:/[03:02 AM]# df -h /raid1 
Filesystem             size   used  avail capacity  Mounted on
raid1                  822G    37K   822G     1%    /raid1
root@dara:/[03:02 AM]# 

Here I created a raidz1 (zfs equivalent of RAID5 with one parity disk, giving me (N-1)*[capacity of the drives]. raidz can survive death of one hard drive. zfs pool can also be creatd with raidz2 command, giving an equivalent of raid5 with two parity disks. Such configuration can survive death of 2 disks) pool.

Note the difference in volume that zpool list and df produce. zpool list shows capacity not counting parity. df shows the more traditional available disk space. Using df will likely cause less confusion in normal operation.

So far so good.

Then I proceeded to create a large file on the ZFS pool:

root@dara:/raid1[03:04 AM]# time mkfile 10g reely_beeg_file

real    2m8.943s
user    0m0.062s
sys     0m5.460s
root@dara:/raid1[03:06 AM]# ls -la /raid1/reely_beeg_file 
-rw------T   1 root     root     10737418240 Nov 10 03:06 /raid1/reely_beeg_file
root@dara:/raid1[03:06 AM]#

While this was running, I was running zpool iostat -v raid1 10 in a different window.

               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
raid1        211M  1.09T      0    187      0  18.7M
  raidz1     211M  1.09T      0    187      0  18.7M
    c1d0        -      -      0    110      0  6.26M
    c1d1        -      -      0    110      0  6.27M
    c0d0        -      -      0    110      0  6.25M
    c0d1        -      -      0     94      0  6.23M
----------  -----  -----  -----  -----  -----  -----

               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
raid1       1014M  1.09T      0    601      0  59.5M
  raidz1    1014M  1.09T      0    601      0  59.5M
    c1d0        -      -      0    364      0  20.0M
    c1d1        -      -      0    363      0  20.0M
    c0d0        -      -      0    355      0  19.9M
    c0d1        -      -      0    301      0  19.9M
----------  -----  -----  -----  -----  -----  -----

[...]
               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
raid1       8.78G  1.08T      0    778    363  91.1M
  raidz1    8.78G  1.08T      0    778    363  91.1M
    c1d0        -      -      0    412      0  30.4M
    c1d1        -      -      0    411  5.68K  30.4M
    c0d0        -      -      0    411  5.68K  30.4M
    c0d1        -      -      0    383  5.68K  30.4M
----------  -----  -----  -----  -----  -----  -----

10 gigabytes written over 128 seconds. About 80 megabytes a second on continuous writes. I think I can live with that.

Next I wanted to run some md5 digests of some files on the /raid1, then export the pool, shut system down, switch around IDE cables, boot system back up, reimport the pool, and re-run the md5 digests. This would simulate moving a disk pool to a different system, screwing up disk ordering in process.

root@dara:/[12:20 PM]# digest -a md5 /raid1/*
(/raid1/reely_beeg_file) = 2dd26c4d4799ebd29fa31e48d49e8e53
(/raid1/sunstudio11-ii-20060829-sol-x86.tar.gz) = e7585f12317f95caecf8cfcf93d71b3e
root@dara:/[12:23 PM]# zpool status
  pool: raid1
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        raid1       ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c0d0    ONLINE       0     0     0
            c0d1    ONLINE       0     0     0
            c1d0    ONLINE       0     0     0
            c1d1    ONLINE       0     0     0

errors: No known data errors
root@dara:/[12:23 PM]# zpool export raid1
root@dara:/[12:23 PM]# zpool status
no pools available
root@dara:/[12:23 PM]#

System was shutdown, IDE cables switched around, system was rebooted.

root@dara:/[02:09 PM]# zpool status
no pools available
root@dara:/[02:09 PM]# zpool import raid1
root@dara:/[02:11 PM]# zpool status
  pool: raid1
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        raid1       ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c1d0    ONLINE       0     0     0
            c1d1    ONLINE       0     0     0
            c0d0    ONLINE       0     0     0
            c0d1    ONLINE       0     0     0

errors: No known data errors
root@dara:/[02:11 PM]# 

Notice that the order of the drives changed. Was c0d0 c0d1 c1d0 c1d1, and now it’s c1d0 c1d1 c0d0 c0d1.

root@dara:/[02:22 PM]# digest -a md5 /raid1/*
(/raid1/reely_beeg_file) = 2dd26c4d4799ebd29fa31e48d49e8e53
(/raid1/sunstudio11-ii-20060829-sol-x86.tar.gz) = e7585f12317f95caecf8cfcf93d71b3e
root@dara:/[02:25 PM]#

Same digests.

Oh, and a very neat feature…. You want to know what was happening with your disk pools?

root@dara:/[02:12 PM]# zpool history raid1
History for 'raid1':
2006-11-10.03:01:56 zpool create raid1 raidz c0d0 c0d1 c1d0 c1d1
2006-11-10.12:19:47 zpool export raid1
2006-11-10.12:20:07 zpool import raid1
2006-11-10.12:39:49 zpool export raid1
2006-11-10.12:46:14 zpool import raid1
2006-11-10.14:09:54 zpool export raid1
2006-11-10.14:11:00 zpool import raid1

Yes, zfs logs the last bunch of commands on to the zpool devices. So even if you move the pool to a different system, command history will still be with you.

Lastly, some versioning history for ZFS:

root@dara:/[02:19 PM]# zpool upgrade raid1 
This system is currently running ZFS version 3.

Pool 'raid1' is already formatted using the current version.
root@dara:/[02:19 PM]# zpool upgrade -v
This system is currently running ZFS version 3.

The following versions are suppored:

VER  DESCRIPTION
---  --------------------------------------------------------
 1   Initial ZFS version
 2   Ditto blocks (replicated metadata)
 3   Hot spares and double parity RAID-Z

For more information on a particular version, including supported releases, see:

http://www.opensolaris.org/os/community/zfs/version/N

Where 'N' is the version number.
root@dara:/[02:19 PM]# 

Power consumption and hard drives

Some numbers about power consumption of hard drives….

Maxtor DiamondMax 10 6L300R0, 7200 RPM, 300 gig (279.48GB formatted) ATA hard drive has the following power consumption: +5V 740 mA, +12V 1500 mA.

Seagate Barracuda ST3300831A, 7200 RPM, 300 gig (279.45GB formatted) ATA hard drive has the following power consumption: +5V 460 mA, +12V 560 mA.

Seagate tech spec sheet claims that their ‘cudas also take 2.8 amps of +12V to spin up. Maxtor doesn’t have a useful spec sheet for their product.

Observations: Seagate has a 5 year warranty on their drives. Lower power consumption means lower power dissipation, and thus cooler system. Lower power consumption means that you can get away with smaller power supply (or more drives in a system), and thus reduce your power consumption costs (that are more of an issue in a 24/7 environment) and air conditioning/cooling costs.

Conclusions: One should spec hard drives not only from the point of view of costs (WD is cheap but in my experience dies like a butterfly under a cold spell), but from the point of view of warranty and power consumption. Sadly vendors do not provide power consumtion information in their spec sheets, so the only way to find it out is by going to a computer store, asking to look at an OEM drive, and reading off the numbers.

Merging Keychains?

Does anyone know how to merge multiple Keychains in Mac OS X?

I know I can copy items from one keychain to another, but that involves authenticating twice.

I tried going in and adding those other keychains to be part of my list, but they don’t stay. Frustrating.

Why am I doing this? I replaced my computer, and was not able to transfer my account at setup time, so I ended up with some old keychains that got copied over.

Suggestions, comments, rants?

All are welcome!

Dave

Pelican Case Guarantee: unconditional? Not!

A photographer friend of mine, in the midst of the “controlled chaos” of his daughter’s 3rd birthday party, somehow got on to the topic of Pelican cases. He has been using them, and abusing them, for many years, and will do so for many more. But, he warned us that even though they are guaranteed against almost anything they are not covered against damage from toddlers! Being curious I decided to check.. and lo and behold, kids under five are put into the same catergory as Bear attacks and Shark bites, which their guarantee also does not cover.

Pelicanâ„¢ Products Unconditional Lifetime Guarantee of Excellence

Let’s put it into perspective though! Go and read a few of the Survival Stories and then wonder just how destructive a toddler can be.

Stoopid WordPress

Just a quick rant.

I [censored] hate WordPress’ “I am smarter then stoopid user” attitude, esp attempts to close what it thinks to be HTML tag. #include <foo.h> in between <pre> and </pre> does NOT mean that it needs to change the code to #include <foo .h> (see space?), and add </foo> at the end of the post for me. Aaaaarrggh!

Why the [censored] does software assume that user is stupider then it is? Why isn’t there a way to turn off input sanitization? Case in point with HTML: <p> tag doesn’t need </p> at the end of the paragraph. Never did. And in my world never will. I just want a paragraph break, dammit!!! Same idea with <br> tag. I just want a newline, not <br> to close it off (and WTF is <br> any way?)

Oh, and if I add &lt s in the body, next time around editing, they get replaced by <s, which upon next save end up tripping Word Press’ input sanitization insanity. Why why oh why?

Aaaaarrgggh! /me bangs head on the wall

We obviously overengineered. Maybe it’s time to EMP every computer on earth and start all over again.

Mac OS X/mach: Identifying architecture and CPU type

Platform independent endinanness check:

#include <stdio.h>
union foo
{
  char p[4];
  int k;
};

int main()
{
  int j;
  union foo bar;
  printf("$Id: endianness.c,v 1.1 2006/07/09 17:48:14 stany Exp stany $nChecks endianness of your platformn");
  printf("Bigendian platform (ie Mac OS X PPC) would return "abcd"n");
  printf("Littleendian platform (ie Linux x86) would return "dcba"n");
  printf("Your platform returned ");
  bar.k = 0x61626364;
  for(j=0; j<4 ; j++) 
  {
  printf("%c",bar.p[j]);
  }

  printf("n");
  return 0;

}

Platform dependent tell me everything check:

/*
 * $Id: cpuid.c,v 1.2 2002/08/03 23:38:39 stany Exp stany $
 */

#include <mach-o/arch.h>
#include <stdio.h>

const char *byte_order_strings[]  = {
        "Unknown",
        "Little Endian",
        "Big Endian",
};

int main() {

  const NXArchInfo *p=NXGetLocalArchInfo();
  printf("$Id: cpuid.c,v 1.2 2002/08/03 23:38:39 stany Exp stany $ n");
  printf("Identifies Darwin CPU typen");
  printf("Name: %sn", p->name);
  printf("Description: %sn", p->description);
  printf("ByteOrder: %sn", byte_order_strings[p->byteorder]);
  printf("CPUtype: %dn", p->cputype);
  printf("CPUSubtype: %dnn", p->cpusubtype);
  printf("nFor scary explanation of what CPUSubtype and CPUtype stand for, nlook into /usr/include/mach/machine.hnn
ppc750t-tG3nppc7400t-tslower G4nppc7450t-tfaster G4nppc970t-tG5n");


return 0;

Mac OS X: Getting things to run on platforms that are not supported

Purposefully oblique description, I know.

Basically there are two ways of not supporting a platform.

One way is to not support the architecture. If I compile something as ppc64, noone on a G3 or G4 CPU will be able to run it natively, nor will x86 folks be able to run it under Rosetta. I can try to be cute, and compile something for x86 arch, cutting off all PPC folks. I can compile something optimized for PPC7400 CPU (G4). G5 and G4 systems will run it and G3s will not (This is exactly what Apple did with iMovie and iDVD in iLife ’06). Lastly, I can compile something in one of the “depreciated” formats, potentially for Classic, and cut off x86 folks, and annoy all PPC folks who would now have to start Classic to run my creation. Oh, the choices.

The other way is to restrict things by the configuration, and check during runtime.

Procedure for checking that the architecture you are using is supported by the application.

bash$ cd Example_App.app/Contents/MacOS
bash$ file Example_App
Example_App: Mach-O fat file with 2 architectures
Example_App (for architecture ppc):  Mach-O executable ppc
Example_App (for architecture i386): Mach-O executable i386

or

bash$ cd Other_Example/Contents/MacOS
bash$ file Other_Example
Other_Example: header for PowerPC PEF executable

Step 2a) If application is Mach-O, then you can use lipo to see if it’s compiled as a generic or as a platform specific:

bash$ lipo -detailed_info Example_App
Fat header in: Example_App
fat_magic 0xcafebabe
nfat_arch 2
architecture ppc
    cputype CPU_TYPE_POWERPC
    cpusubtype CPU_SUBTYPE_POWERPC_ALL
    offset 4096
    size 23388
    align 2^12 (4096)
architecture i386
    cputype CPU_TYPE_I386
    cpusubtype CPU_SUBTYPE_I386_ALL
    offset 28672
    size 26976
    align 2^12 (4096)

If you see CPU_SUBTYPE_POWERPC_ALL, application is compiled for all PowerPC platforms, from G3 to G5.

What you do not want to see on a G3 or G4 system is:

bash$ lipo -detailed_info Example_App
Fat header in: Example_App
fat_magic 0xcafebabe
nfat_arch 1
architecture ppc64
    cputype CPU_TYPE_POWERPC64
    cpusubtype CPU_SUBTYPE_POWERPC_ALL
    offset 28672
    size 8488
    align 2^12 (4096)

Then you need a 64 bit platform, which amounts to G5 of various speeds.

It is possible that the application is in Mach-o format, but not in fat format.
otool -h -v will decode the mach header, and tell you what cpu is required:


Step 2b) If application is PEF (Preferred Executable Format) or CFM (Code Fragment Manager) things might be harder.  I've not yet encountered a CFM or PEF app that would not run on PPC platform in one way or another, so this section needs further expantion. 



In case of a runtime check, most commonly it is the platform architecture that is checked. 

Some Apple professional software has something like this in AppleSampleProApp.app/Contents/Info.plist
        AELMinimumOSVersion
        10.4.4
        AELMinimumProKitVersion
        576
        AELMinimumQuickTimeVersion
        7.0.4
        ALEPlatform_PPC
        
                AELRequiredCPUType
                G4
        
        CFBundleDevelopmentRegion
        English

Getting rid of

        ALEPlatform_PPC
        
                AELRequiredCPUType
                G4
        

tends to get the app to run under G3.

Lastly, if application says something similar to “Platform POWERBOOK4,1 Unsupported”, maybe running strings on SampleApplication/Contents/MacOS/SampleApplication combined with grep -i powerbook can reveal something.

bash$ strings SampleApplication | grep POWER
POWERBOOK5
POWERBOOK6
-
 POWERBOOK6,3
POWERMAC7
POWERMAC9,1
POWERMAC3,6
           POWERMAC11,2
-
 POWERMAC11,1

So if you want to run this application on 500Mhz iBook G3 for some reason (hi, dAVE), it might make sense to fire up a hexeditor, and change one of the “allowed” arches to match yours.

For example to this:

bash$ strings SampleApplication | grep POWER
POWERBOOK4
POWERBOOK6
-
 POWERBOOK6,3
POWERMAC7
POWERMAC9,1
POWERMAC3,6
           POWERMAC11,2
-
 POWERMAC11,1

But don’t mind me. I am just rambling.

Video: DVD Studio Pro – I love you!

Spent a good chunk of today fighting with DVD creation. My previous attempt was using iMovie HD 6 and iDVD 6. Now, I have some issues with iDVD:

Suppose your projects are saved on an external drive, both iMovie and iDVD. You’d think that when you are rendeering the final DVD, it would still keep all the bits and pieces on the external drive, where the data is saved. Oh, no. iDVD is different. iDVD is smarter then the user, and will try to save the intermediate audio track (before the muxing) in /tmp If you have PCM audio track (and all of my previous projects did, because I didn’t know any better. Me Ogg. Ogg stoopid, remember?). Now, imagine that you are running out of disk space on your internal drive as it is, and then out of nowhere another 1.5 – 2 gigs of stuff show up in /tmp. Of course iDVD would die at this point.

iDVD 6 is also temperamental about DVD mastering, and was refusing to even think about creating a dual layer DVD (ie something longer then 4.7 gigs in size) if I didn’t have a dual layer burner plugged into the iBook. Fiar enough, I gave it the drive.

Then after running all night, it would die with a numerical error code (I’ve googled for it, noone saw it before). I tried three times, as originally I thought that I might have exceeded the “TV safe” area on the menu (another famous was of getting iDVD to die) with DVD title, or somesuch. But no, it just wouldn’t work.

So I got access to a machine with Final Cut Studio installed on it.

Oh, what a joy.

Software actuallly uses the location you tell it to use, without arbitrarily using what it should not. Software tells you what it thinks you should do, but lets you overrule it if you think you know better. Sane software.

I’ve plugged in my external hard drive, and imported into DVD Studio Pro DV streams which I used in iMovie and iDVD without much success before. It happily dealt with them.

Quickly I created a timeline. I’ve had everything pretty much pre-rendered, so it was as simple as setting a bunch of chapter breaks, creating a menu and linking the buttons to actions (ie Play chapter 1).

System I was using is an elderly G4 1.5 Ghz which was kind of skipping frames when dealing with large streams, and thus creating chapters was a bit of an excercise in patience. I’ve opened the iMovie project, and looked at places where I’ve placed chapter breaks before. In DVD studio I’ve created a bunch of chapter breaks arbitrarily, and then adjusted the times, so they would match more or less what I had in iMovie.

Worked as it was supposed to. Beautiful.

DVD Studio was telling me that my project would compress down to 5.1 gigs. At this point I thought that I should just do it, and then run it through DVD2OneX or somesuch, and shrink it down to 4.7 gigs, and told it to go ahead and just do it. It happily rendered to hard drive (it also asked me where I want to set layer breaks in dual layer disk, which was really nice too).

Eventually I realized that there is such a thing as Compressor, that can take a component of a multiplexed stream, and convert it to a different format.

After taking two 12 gig DV streams, and running them through Compressor, I’ve converted the audio tracks on both streams from PCM audio to Dolby 2.0 AC3.

Once I’ve imported the AC3 streams into DVD Studio Pro project, deleted the PCM audio from the timeline, and added in ac3 audio, projected project size dropped from 5.1 gigs to 4.1 gigs, and actual project size (once assets were rendered) dropped from 4.9 gigs to 3.6 gigs (I’ve used crappy video as DV source, from video tapes that were sitting in storage for god knows how long, so they compressed a fair bit).

So overall, I am really really happy with DVD Studio, although I’ve not used even 1/10’s of it’s capabilities. It can create HD DVDs. It can embed web links in mpeg files. It can edit existing menus. Now I need to save up my shekels to buy it (899 CAD for student license for Final Cut Studio).