[fpc-pascal] Unicode chars losing information
Martin Frb
lazarus at mfriebe.de
Tue Mar 9 02:07:49 CET 2021
On 08/03/2021 23:26, Tomas Hajny via fpc-pascal wrote:
> On 2021-03-08 21:36, Martin Frb via fpc-pascal wrote:
>
>>
>> I can think of 2 groups already.
>> 1) Conversion due to explicit declared different encoding.
>> AnAnsiString := SomeWideString;
>> AnAsciiString := AnUtf8String; // declared as "type
>> AnsiString(CP_ASCII);" and "type AnsiString(CP_UTF8);"
>
> Do you mean a compile-time warning? The trouble is that the compiler
> wouldn't know whether a real widestring manager would get included in
> the final binary when such conversions are encountered. And remember
> that the final binary may be compiled at a different time from the
> moment when the unit containing such conversions is compiled. In other
> words, compile-time warnings would be rather difficult to implement.
Yes, I mean a compile time warning.
But, not in the above case. In the above case the users could kind of
reasonably be expected to know a widestring manager is needed.
However, IMHO that differs in the below case:
>> 2) Conversion where at least one string is not explicitly declared for
>> a certain codepage.
>> This should include indirection via $codepage
>
> No, this is not the case. $codepage defines the source file encoding.
> The compiler translates the string constants declared this way to a
> UTF-16 constant stored within the compiled binary. Specifying
> $codepage has no implications on runtime conversions by itself.
So "const Foo = 'abäö';" is always stored as utf-16?
That is something IMHO unexpected.
But more to the point
var s: AnsiString
var s2: UnicodeString
var s3: WideString
s := Foo;
s := 'abäö';
s2 := Foo;
s2 := 'abäö';
s3 := Foo;
s3 := 'abäö';
Does any of the assignments "s:=" or "s2:=", "s3:=" cause a conversion?
(For this it does not matter if this depends or does not depend on a
$Codepage / all that matters is, if there is some case in which it
causes conversion)
If it never causes a conversion, then I misread/misunderstood something.
If it does, it is IMHO very unexpected. After all why include a constant
in a way that it must still be computed before it can be used?
I do not include pi as a formula to be computed at runtime, I define it
to the precision I will need (and/or can store) as pre-computed constant
of 3.14159....
So if that causes a conversion, then that is worth a warning/note.
And IMHO it is worth a warning, even if a widestring manager is present.
Because that conversion which it causes is most likely not wanted by the
user.
>> - This could be given, even if the presence/absence of a widestring
>> manager is not known. Because
>
> Because what?
Reason above.
I hit send accidentality. I then decided to wait, and answer it with the
next response (i.e. now)
>> Obviously knowing the presence/absence of a widestring manager allows
>> to refine warnings.
>> But I guess that comes at a higher price, as each unit when compiled
>> could only set flags in the ppu (including forwarding flags from used
>> units).
>> And the compiling the final program would read which warning flags are
>> present, and if any unit flagged the inclusion of a widestring
>> manager.
>
> Yes, this would be indeed the only possibility.
On 08/03/2021 23:23, Michael Van Canneyt via fpc-pascal wrote:
> The compiler has no way to know if the widestring manager actually does a
> complete or even a good job. Maybe it just does logging
Even then, the mere fact that the user added a W.M. other than default,
would indicate that the user is aware, and hence does not need a
hint/warning.
Sure the user might not be aware..., but it's to catch common problems,
not every border/edge case.
Still, I agree that the "unit flag" solution is too costly to
implement/maintain.
More information about the fpc-pascal
mailing list