[fpc-pascal] Unicode file routines proposal

Luca Olivetti luca at ventoso.org
Tue Jul 1 09:35:35 CEST 2008


En/na Marco van de Voort ha escrit:
>>> They have a UTF-16/UCS-2 internal representation, same as MSEgui which works 
>>> very well and is fast and handy BTW.
>> And len, slicing, etc. work as expected.
>> Note that if you need characters beyond $ffff you have to compile it
>> with wide unicode support, and in that case every character will use 4
>> bytes.
>>
> That's IMHO a faulty system. It requires you to choose between an incomplete
> solution or making strings a horrible memory hog.

OTOH using variable length characters will make string operations 
expensive (since you can't just multiply the index by 2 or 4 but you 
have to examine the string from the beginning, and the length in bytes 
isn't the same as the length in characters).

> But maybe that doesn't
> matter for mere scripting languages (though I wonder then why they didn't
> chose UTF-32 directly)
> 
> Surrogates are not nice, but they were invented for a reason.

Well, yes, they're a trade-off between performance and memory 
consumption, but I fear we're losing one of the advantages that pascal 
has over C: fast and simple string handling.

Bye
-- 
Luca



More information about the fpc-pascal mailing list