[fpc-pascal] Unicode file routines proposal
Luca Olivetti
luca at ventoso.org
Tue Jul 1 09:35:35 CEST 2008
En/na Marco van de Voort ha escrit:
>>> They have a UTF-16/UCS-2 internal representation, same as MSEgui which works
>>> very well and is fast and handy BTW.
>> And len, slicing, etc. work as expected.
>> Note that if you need characters beyond $ffff you have to compile it
>> with wide unicode support, and in that case every character will use 4
>> bytes.
>>
> That's IMHO a faulty system. It requires you to choose between an incomplete
> solution or making strings a horrible memory hog.
OTOH using variable length characters will make string operations
expensive (since you can't just multiply the index by 2 or 4 but you
have to examine the string from the beginning, and the length in bytes
isn't the same as the length in characters).
> But maybe that doesn't
> matter for mere scripting languages (though I wonder then why they didn't
> chose UTF-32 directly)
>
> Surrogates are not nice, but they were invented for a reason.
Well, yes, they're a trade-off between performance and memory
consumption, but I fear we're losing one of the advantages that pascal
has over C: fast and simple string handling.
Bye
--
Luca
More information about the fpc-pascal
mailing list