[fpc-pascal] Unicode chars losing information

Martin Frb lazarus at mfriebe.de
Tue Mar 9 02:07:49 CET 2021


On 08/03/2021 23:26, Tomas Hajny via fpc-pascal wrote:
> On 2021-03-08 21:36, Martin Frb via fpc-pascal wrote:
>
>>
>> I can think of 2 groups already.
>> 1) Conversion due to explicit declared different encoding.
>>    AnAnsiString := SomeWideString;
>>   AnAsciiString := AnUtf8String; // declared as "type
>> AnsiString(CP_ASCII);" and "type AnsiString(CP_UTF8);"
>
> Do you mean a compile-time warning? The trouble is that the compiler 
> wouldn't know whether a real widestring manager would get included in 
> the final binary when such conversions are encountered. And remember 
> that the final binary may be compiled at a different time from the 
> moment when the unit containing such conversions is compiled. In other 
> words, compile-time warnings would be rather difficult to implement.
Yes, I mean a compile time warning.
But, not in the above case. In the above case the users could kind of 
reasonably be expected to know a widestring manager is needed.

However, IMHO that differs in the below case:

>> 2) Conversion where at least one string is not explicitly declared for
>> a certain codepage.
>>    This should include indirection via $codepage
>
> No, this is not the case. $codepage defines the source file encoding. 
> The compiler translates the string constants declared this way to a 
> UTF-16 constant stored within the compiled binary. Specifying 
> $codepage has no implications on runtime conversions by itself.
So "const Foo = 'abäö';" is always stored as utf-16?
That is something IMHO unexpected.

But more to the point
   var s: AnsiString
   var s2: UnicodeString
   var s3: WideString
   s := Foo;
   s := 'abäö';
   s2 := Foo;
   s2 := 'abäö';
   s3 := Foo;
   s3 := 'abäö';

Does any of the assignments  "s:=" or "s2:=", "s3:=" cause a conversion?
(For this it does not matter if this depends or does not depend on a 
$Codepage / all that matters is, if there is some case in which it 
causes conversion)

If it never causes a conversion, then I misread/misunderstood something.

If it does, it is IMHO very unexpected. After all why include a constant 
in a way that it must still be computed before it can be used?
I do not include pi as a formula to be computed at runtime, I define it 
to the precision I will need (and/or can store) as pre-computed constant 
of 3.14159....

So if that causes a conversion, then that is worth a warning/note.
And IMHO it is worth a warning, even if a widestring manager is present. 
Because that conversion which it causes is most likely not wanted by the 
user.


>> - This could be given, even if the presence/absence of a widestring
>> manager is not known. Because
>
> Because what?
Reason above.
I hit send accidentality. I then decided to wait, and answer it with the 
next response (i.e. now)


>> Obviously knowing the presence/absence of a widestring manager allows
>> to refine warnings.
>> But I guess that comes at a higher price, as each unit when compiled
>> could only set flags in the ppu (including forwarding flags from used
>> units).
>> And the compiling the final program would read which warning flags are
>> present, and if any unit flagged the inclusion of a widestring
>> manager.
>
> Yes, this would be indeed the only possibility.

On 08/03/2021 23:23, Michael Van Canneyt via fpc-pascal wrote:
> The compiler has no way to know if the widestring manager actually does a
> complete or even a good job. Maybe it just does logging

Even then, the mere fact that the user added a W.M. other than default, 
would indicate that the user is aware, and hence does not need a 
hint/warning.
Sure the user might not be aware..., but it's to catch common problems, 
not every border/edge case.

Still, I agree that the "unit flag" solution is too costly to 
implement/maintain.










More information about the fpc-pascal mailing list