[fpc-devel] UTF-8 string literals

Fri May 5 15:20:00 CEST 2017

> You should weigh the advantages you outline here against the disadvantages of
> no longer knowing how string literals will be encoded.
As a programmer, either I don't want to know (declared const without giving
explicit type) or I do, then I did declare it correctly:

{$codepage utf8}
var u: UTF8String = 'äöüالعَرَبِيَّة';
  -> UTF8String containing the characters I entered in the source file (in this
case(!!) just 1:1 copy).

{$codepage utf8}
var u: UCS4String= 'äöü';
  -> UCS4 encoded Version, either 000000e4 000000f6 000000fc or the equivalent
with combining characters

There should probably be an error if the characters I typed don't actually exist
in the declared type (emoji in an UCS2String), but otherwise, there's no good
reason why that shouldn't "just work".

> It means e.g. the resource string tables will have entries that are UTF16 encoded
> or entries that are UTF8 encoded, depending on the unit they come from. 
> This is highly undesirable.
Always convert from "unit CP" to UTF8 (or UTF16 if some binary compat is
required), done. Aren't they just internal anyway?

> By forcing everything UTF16 we ensure delphi compatibility (yes it does matter) 
> and we also ensure a uniform set of string tables.
If that was what happened, ok. But from the error message Matthias listed as (1)
I would assume that the actual string type is UCS2String, at least at some point
in the process.

Just my 2 cents...

Martok

PS: adding to the discussion over on the Lazarus ML: I just found a fourth wiki
page describing a slightly different Unicode support. This is getting ridiculous.