diglib Archive
Date: Mon Oct 17 11:00:39 2005
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: diglib: Issue from August 15 DCC meeting - Mac OS Files



This is a good technical discussion issue. A bit more background:

1/ the information in a resource fork may in some cases be non- critical, but in other cases is totally essential to the file. For example, if one is preserving a MacOS 9 (classic mode) program, the resource fork contains the actual code. In many cases, though, the only information in the resource fork is metadata, and we all know that metadata can be dispensed with ( that's :-) of course).

2/ in modern MacOS X world resource forks are a bit less critically important, since now an application (program) is actually implemented as what appears to be a file but is actually a large number of files in a directory.

3/ MacOS is not the only system that supports multiple forks. Windows files can also have multiple forks (more than 2, in fact). Luckily, multiple forks on NTFS file systems are very rarely used except by hackers who are trying to hide information or by Windows servers that are supporting Mac clients, so most of the time on Windows you can pretend that a file is a named finite-length single byte stream.

4/ there are many semi-standard ways to encode a Mac file as a single bitstream (hence an easy candidate for storing in a file on a linux or FAT file system. One common approach is to use MacBinary II or a similar encoding, which basically packages the 2 forks in a single one with some syntax to allow a parser to unpack the two. Another is to use Stuffit archive format, which also allows multiple MacOS files to be packed into a single OS file. Corey describes a different packing convention that uses two single-stream files on the hard disk. That's of particular interest because it's the convention that MacOS itself uses when storing files on filesystems that don't support multiple-fork semantics internally (e.g. a Mac hard disk formatted as unix style rather than Mac HFS+).

5/ Two vital pieces of information that are traditionally stored in a Mac resource fork but can be stored in other ways as well are the file type and preferred program to open it. Each of these is a 4- character string (with reasonable authority control). It was traditionally common, for instance, to have a Mac file name that was arbitrary, and to store the information that this was a JPEG file in the resource fork rather as part of the file name.

6/ resource fork metadata is one type of filesystem metadata that may need to be preserved, but it's not the only one. File names may be important. File modes/protections may be important. Ad nauseum.

I think the issue of file names and in particular leading periods is a separable problem, but with its own swamps. File names beginning with periods are very common on unix and linux systems as well as on Macs. There are other problems connected with file names, e.g. the Unix restriction that file names not contain "/" or null, and the widespread restriction that filenames have a fairly short length and contain only US-ASCII characters (hence no Unicode).

I don't have an opinion as to whether the resource forks on the art on file files need to be preserved, but I'm absolutely sure that we WILL have occasions in the future where the content of resource forks is critical to the meaning of the intellectual resource and needs to be preserved.

JQ