[fpc-devel] UTF-8 string literals

Michael Van Canneyt michael at freepascal.org
Fri May 5 16:22:18 CEST 2017



On Fri, 5 May 2017, Sven Barth via fpc-devel wrote:

> Am 05.05.2017 15:55 schrieb "Michael Van Canneyt" <michael at freepascal.org>:
>>
>>
>>
>> On Fri, 5 May 2017, Mattias Gaertner wrote:
>>
>>> On Fri, 5 May 2017 14:30:32 +0200 (CEST)
>>> Michael Van Canneyt <michael at freepascal.org> wrote:
>>>
>>>> [...]
>>>>> AFAIK FPC stores UTF-8 string literals (-Fcutf8) as widestrings
>>>>> instead of UTF8String. Please correct me if I'm wrong.
>>
>>
>> To make sure I was presenting correct facts, I did some tests.
>>
>> As a result of the tests, I think the above statement is wrong.
>
> In all three cases you are either explicitly or implicitly forcing the
> compiler to convert it to Ansi/UTF-8 and since it's a constant it takes a
> compiletime shortcut.

That was on purpose because Mattias' example on the Lazarus list required
this. The point was that PChar() is not usable on string literals.

See also his initial mail, which contains the statement:

"3. PChar on a string literal does not work as expected. You get the
bytes of a widestring instead."

So, I did a typecast. (even though I think it is horrible code).

> If you'd do a Writeln without the typecast then it will be a UTF-16
> constant that is stored in the binary *if* the string contains a character
>> $7F.

Well, at least now I understand very well why people find it confusing :-)

I think we'll need a comprehensive table in the documentation.
Can this be produced somehow ?

Michael.



More information about the fpc-devel mailing list