[fpc-devel] simple UTF tests
Marco van de Voort
marcov at stack.nl
Thu Jan 5 12:32:45 CET 2012
In our previous episode, Michael Schnell said:
> With Lazarus on Linux, I did some simple tests with UTF strings.
> I found that the length of an "AnsiString(CP_UTF16)" is given in terms
> of bytes and not of Words. Is this like it should ?
Yes. Afaik that is not a sane combination, but Delphi compatible.
> I found that pchar(s8) with an UTF-8 string works as expected, giving a
> pointer to the UTF-8 encoded byte array.
> Anyway: is it obvious, that the encoding of pchar is UTF-8 ? Is this
> portable ?
pchar should give access to the raw data of the default string type. (be it
still 8-bit as in FPC, or 16-bit in Delphi).
> p16 = pchar(s16) with an UTF-16 gives a pointer to the first byte of the
> word array, so (with ASCII text), the second byte is zero, thus a
> C-String length 1. Is this like it should ?
Yes. This is not sane code (even if you want e.g. the lower byte, this is
not endian safe), since s16 is currently not the default string type
> Of course re-assigning p16 to an UTF-16 string does not reproduce the
> original string.
> What encoding is to be supposed for a pchar ?
pchar's provide access to memory with the granularity
of the default string type. Whatever that is, 8-bit or 16-bit, and in
whatever encoding it is stored.
When converted to something else, the default system encoding for the
corresponding default string is probably used.
To force 8 or 16 bits one should use pansichar or pwidechar. This is Delphi
> The Debugger does not show UTF-16-Strings correctly (it shows the same
> result as pchar() ). Is this just a Lazarus problem, or does FPC need to
> provide additional support for this ?
No idea. Both are possible.
More information about the fpc-devel