Re: Data transfer methods

From: Patrycjusz R. £ogiewa (silverdr_at_inet.com.pl)
Date: 2005-05-11 13:59:10

On 2005-05-11, at 13:21, Marko Mäkelä wrote:

> On Wed, May 11, 2005 at 12:57:19PM +0200, Patrycjusz R. £ogiewa wrote:
>>>  (Oh well, newer file
>>> systems have problems with file names as well.  Take the
>>> case-insensitive
>>> Apple HFS+, for example.  If I have understood the technical notes
>>> correctly,
>>> the case-folding will effectively treat strings consisting of 
>>> non-Latin
>>> letters as empty strings.  This would mean that e.g., Greek, Russian 
>>> or
>>> Japanese users would have to give Latin names to their documents.)
>>
>> Disagreed.
>>
>> This kind of case-insensitivity as used in HFS+ has its own name, 
>> which
>> I forgot but it certainly doesn't affect using non-Latin filenames.
>
> I was referring to this technical note:
> http://developer.apple.com/technotes/tn/tn1150.html
>
> It seems that I confused HFS+ with HFS:

Might be, but even in the old days of MacOS 7.1, I used a Chinese 
version, which allowed me to use the names in Chinese characters. It 
was more or less all about the "scripts" they introduced somewhere 
around that time.

>
> "The problem with using non-Roman scripts in an HFS file name is that 
> HFS
> compares file names in a case- insensitive fashion. The 
> case-insensitive
> comparison algorithm assume a MacRoman encoding. When presented with
> non-Roman text, this algorithm fails in strange ways. The upshot is 
> that
> HFS decides that certain non-Roman file names are duplicates of other 
> file
> names, even though they are not duplicates in the source encoding."

Yeah, the remnants of the "scripts" are biting us still today. I have 
to deal with them on a daily basis as I also happen to have a 
"non-Roman" language as my native one. Anyway, at least the filesystem 
related problems are mostly gone today.

> Anyway, I think that it is a bad idea to disallow a large group of 
> characters
> or strings in a file system.

Perfectly agreed.

>   Unix-like file systems do it nicely: the only
> disallowed characters are the directory separator and NUL, the 
> end-of-string
> marker in C library routines.  Well, Commodore takes this further, 
> treating
> the file name as an arbitrary binary string.

Which is quite logical IMHO - yet unixalikes have somehow to deal with 
the directory hierarchy, hence the limit on the separator.

-- 
As we all know, Linux is only free if your time has no value - Jamie 
Zawinsky



       Message was sent through the cbm-hackers mailing list

Archive generated by hypermail pre-2.1.8.