[fpc-devel] Unicode and UTF8String
Martin Friebe
fpc at mfriebe.de
Mon Dec 1 15:30:28 CET 2008
Marco van de Voort wrote:
> In our previous episode, Martin Friebe said:
>
>> I agree, using RTlString will probably help fpc to optimize your exe for
>> each OS.
>>
>> But, using RTLString means you do not know, if you have UTF8 or not.
>>
> Correct.
>
>> Because UTF8 behaves slightly different from other Strings, many
>> operations can not be performed on RTLString
>>
>> foo[1], copy, pos ... simply because you do not know, if the result is a
>> char, a codepoint or a subcodepoint (single utf8 byte)
>>
> You don't know that about UTF-16 either. Even though that is no problem in
>
True, good point
> most cases, it is slowly time to abandon too simplistic thinking about
> strings. The best solution is to minimize editing, and localize them in
> certain parts of the code, keeping most of the code encoding agnostic.
>
True, too. But we are talking Pascal, not some other language.
string[index], copy, pos, length have always been part of Pascal.
Of course they are still there, to be used in the few parts of your
code, where you specialize on whatever string type you deal with.
But otherwise, using RTLString IMHO will abandon this part of pascal
syntax. A function of which the result can not be used, as it can
change at compile time => such a function can not be used. (or we will
have buffer overflows, code injection and more ...)
I admit that the Problem started (and that has been discussed more than
enough) starts with UTF8string (yes even with utf16 string). But in this
case those functions became a new, but predictable meaning. I can do
utf8string[1], and I can use the result. Only I have to be aware what it
means.
I can *not* do rtlString[1], as at the time of code writing I can not be
aware what it means. It is only decided, at compilation time. IFDEFs
won't help neither, because they can only cope with the set of
stringtypes know at the time the code is written. This breaks each time
FPC will be extended.
> and localize them in
> certain parts of the code, keeping most of the code encoding agnostic.
Sorry I can't help taking that into another direction, (which also has
been discussed before). The above quote sounds like a sentence from a
introduction into "object orientation". Sure it is the right thing..
It is right for OO. So it should be right for strings as well.
Just again, it simply will be a new language, which a string-object,
instead of pascal.
> And yes, if you lazy, you lose performance due to automatic conversions. It
> has always been that way (also when mixing short and ansistring)
>
In other words, write pascal code, just do not use some of the (imho)
most common elements of pascal syntax?
I acknowledge a language is a living thing, and needs to be adjusted to
the new things, that come up over time. I only ask, if this is the best way?
> This is not just a good thing for OS interfacing code, but a good thing in
> general.
>
> _______________________________________________
> fpc-devel maillist - fpc-devel at lists.freepascal.org
> http://lists.freepascal.org/mailman/listinfo/fpc-devel
>
More information about the fpc-devel
mailing list