[fpc-devel] UTF-8 string literals

Michael Van Canneyt michael at freepascal.org
Fri May 5 14:30:32 CEST 2017



On Fri, 5 May 2017, Mattias Gaertner wrote:

> Hi,
>
> AFAIK FPC stores UTF-8 string literals (-Fcutf8) as widestrings
> instead of UTF8String. Please correct me if I'm wrong.
>
> This has several side effects:
>
> 1. When using a character outside BMP FPC stops with:
> Error: UTF-8 code greater than 65535 found
> For example:
> const Eyes = '👀';
>
> 2. Assigning a UTF-8 literal to an UTF8String requires a
> widestringmanager.
> For example non ISO-8859-1 chars are mangled:
> var u: UTF8String = 'äöüالعَرَبِيَّة';

I assume you mean UTF-16 literal ?

>
> 3. PChar on a string literal does not work as expected. You get the
> bytes of a widestring instead.

You should weigh the advantages you outline here against the disadvantages of
no longer knowing how string literals will be encoded.

It means e.g. the resource string tables will have entries that are UTF16 encoded
or entries that are UTF8 encoded, depending on the unit they come from. 
This is highly undesirable.

By forcing everything UTF16 we ensure delphi compatibility (yes it does matter) 
and we also ensure a uniform set of string tables.

Michael.


More information about the fpc-devel mailing list