[fpc-devel] Unicode support (yet again)
Hans-Peter Diettrich
DrDiettrich1 at aol.com
Thu Sep 15 19:09:08 CEST 2011
Graeme Geldenhuys schrieb:
> On 14/09/2011 17:02, Hans-Peter Diettrich wrote:
>> Many users still want simple string handling, with direct mapping
>> between logical and physical chars (SBCS). This is not possible at all
>> with UTF-8, while UTF-16 works fine with the BMP, at least.
>
> What rubbish! The only "utf-8 limit" is that the current FPC and Delphi
> RTL's don't cater for it due to the legacy ANSI support that came
> before.
What data type would you use, to store an UTF-8 character?
And how to access the n-th character in an UTF-8 string?
...
>> (platform dependent) RTL conventions, but it affects the standard
>> components (string lists...) in the FCL, and the other components in
>> the LCL.
>
> Please give a concrete example where using platfrom dependent encodings
> (eg: UnicodeString = UTF-8 on Linux, but UTF-16 on Windows) will
> cause problems? I really cannot see any issues here, only positives
> like better performance for each platform due to no need for
> auto-conversions.
As already pointed out, string encoding conversions between application
and widgets are rare, consequently performance depends more on string
handling in application code. Now the new Delphi string types, with
automatic conversion when required, can cause a slowdown. In FPC
character-based access to strings also can cause a slowdown (iterators...).
When a multi-platform application must be aware of possible UTF-8
strings, depending on the platform, the code must be MBCS aware. This
again is complicated string handling, when otherwise immediate indexed
access is possible :-(
>> Here again the average user will prefer UTF-16 component libraries,
>> compatible with his own code, while more experienced users may be
>> happier with the current UTF-8 libraries.
>
> What the hell has "experience" got to do with the preference between
> UTF-8 and UTF-16? To the developer (and more so to the end-user) a
> Unicode string should act like any other Unicode string. What encoding
> is used to represent "hello world" shouldn't even come into question.
This applies only to constant string literals, where the user never has
to care for string encoding and conversion.
>> English (ASCII) users also may prefer UTF-8, as long as they do not
>> have to (or want to) deal with strings in foreign languages.
>
> Rubbish once again! Our applications use UTF-8, I have no problems
> writing application that support multiple foreign language - as long as
> those languages are left-to-right (I don't understand RTL languages,
> so can't comment).
You better should understand ;-)
RTL is a mere *display* feature, the chars still are stored from first
to last. More important is the SBCS/MBCS difference, which must be
reflected in user code. Even if *you* have no problems with MBCS (like
UTF-8), other users have.
DoDi
More information about the fpc-devel
mailing list