[fpc-pascal] Unicode chars losing information

Martin Frb lazarus at mfriebe.de
Mon Mar 8 21:36:26 CET 2021


On 08/03/2021 20:49, Jonas Maebe via fpc-pascal wrote:
> On 08/03/2021 19:16, Ryan Joseph via fpc-pascal wrote:
>> I agree it would be nice to have some warning that indexing the unicodeString wouldn't work as expected.
> Then the compiler would have to give a warning for any indexing of
> unicodestring. That would render it useless, because everyone would just
> turn it off. It's not possible to safely use unicodestring without
> knowing how 16bit unicode works. The compiler can't solve that.
>
>

Indexed access to a string, is different from implicitly inserted call 
to encoding conversions.

In the example the index access should have returned a single codeunit, 
which was known to be a complete codepoint.
As far as I understand the unexpected part was, that the unicode string 
did not contain the content of the string constant, because the 
assignment had caused an encoding conversion to be inserted.
That conversion caused the need for a widestring manager.

Maybe to help the search when/where and whatfor notes/warnings 
should/could be produced, those implicit conversions can be broken down 
into groups.
I can think of 2 groups already.
1) Conversion due to explicit declared different encoding.
    AnAnsiString := SomeWideString;
   AnAsciiString := AnUtf8String; // declared as "type 
AnsiString(CP_ASCII);" and "type AnsiString(CP_UTF8);"
2) Conversion where at least one string is not explicitly declared for a 
certain codepage.
    This should include indirection via $codepage


Then maybe as a first step, a note/warning could be given, if a constant 
string is assigned to a variable, and a change of encoding is needed for 
this.
- "constant string" here would be any string that does not have a direct 
explicit declared encoding.
- This could be given, even if the presence/absence of a widestring 
manager is not known. Because



Obviously knowing the presence/absence of a widestring manager allows to 
refine warnings.
But I guess that comes at a higher price, as each unit when compiled 
could only set flags in the ppu (including forwarding flags from used 
units).
And the compiling the final program would read which warning flags are 
present, and if any unit flagged the inclusion of a widestring manager.






More information about the fpc-pascal mailing list