[fpc-pascal] Unicode chars losing information
Martin Frb
lazarus at mfriebe.de
Mon Mar 8 21:36:26 CET 2021
On 08/03/2021 20:49, Jonas Maebe via fpc-pascal wrote:
> On 08/03/2021 19:16, Ryan Joseph via fpc-pascal wrote:
>> I agree it would be nice to have some warning that indexing the unicodeString wouldn't work as expected.
> Then the compiler would have to give a warning for any indexing of
> unicodestring. That would render it useless, because everyone would just
> turn it off. It's not possible to safely use unicodestring without
> knowing how 16bit unicode works. The compiler can't solve that.
>
>
Indexed access to a string, is different from implicitly inserted call
to encoding conversions.
In the example the index access should have returned a single codeunit,
which was known to be a complete codepoint.
As far as I understand the unexpected part was, that the unicode string
did not contain the content of the string constant, because the
assignment had caused an encoding conversion to be inserted.
That conversion caused the need for a widestring manager.
Maybe to help the search when/where and whatfor notes/warnings
should/could be produced, those implicit conversions can be broken down
into groups.
I can think of 2 groups already.
1) Conversion due to explicit declared different encoding.
AnAnsiString := SomeWideString;
AnAsciiString := AnUtf8String; // declared as "type
AnsiString(CP_ASCII);" and "type AnsiString(CP_UTF8);"
2) Conversion where at least one string is not explicitly declared for a
certain codepage.
This should include indirection via $codepage
Then maybe as a first step, a note/warning could be given, if a constant
string is assigned to a variable, and a change of encoding is needed for
this.
- "constant string" here would be any string that does not have a direct
explicit declared encoding.
- This could be given, even if the presence/absence of a widestring
manager is not known. Because
Obviously knowing the presence/absence of a widestring manager allows to
refine warnings.
But I guess that comes at a higher price, as each unit when compiled
could only set flags in the ppu (including forwarding flags from used
units).
And the compiling the final program would read which warning flags are
present, and if any unit flagged the inclusion of a widestring manager.
More information about the fpc-pascal
mailing list