[fpc-devel] UTF-8 string literals
Martok
listbox at martoks-place.de
Fri May 5 15:20:00 CEST 2017
> You should weigh the advantages you outline here against the disadvantages of
> no longer knowing how string literals will be encoded.
As a programmer, either I don't want to know (declared const without giving
explicit type) or I do, then I did declare it correctly:
{$codepage utf8}
var u: UTF8String = 'äöüالعَرَبِيَّة';
-> UTF8String containing the characters I entered in the source file (in this
case(!!) just 1:1 copy).
{$codepage utf8}
var u: UCS4String= 'äöü';
-> UCS4 encoded Version, either 000000e4 000000f6 000000fc or the equivalent
with combining characters
There should probably be an error if the characters I typed don't actually exist
in the declared type (emoji in an UCS2String), but otherwise, there's no good
reason why that shouldn't "just work".
> It means e.g. the resource string tables will have entries that are UTF16 encoded
> or entries that are UTF8 encoded, depending on the unit they come from.
> This is highly undesirable.
Always convert from "unit CP" to UTF8 (or UTF16 if some binary compat is
required), done. Aren't they just internal anyway?
> By forcing everything UTF16 we ensure delphi compatibility (yes it does matter)
> and we also ensure a uniform set of string tables.
If that was what happened, ok. But from the error message Matthias listed as (1)
I would assume that the actual string type is UCS2String, at least at some point
in the process.
Just my 2 cents...
Martok
PS: adding to the discussion over on the Lazarus ML: I just found a fourth wiki
page describing a slightly different Unicode support. This is getting ridiculous.
More information about the fpc-devel
mailing list