[fpc-pascal] Unicode chars losing information

Michael Van Canneyt michael at freepascal.org
Tue Mar 9 10:06:20 CET 2021



On Tue, 9 Mar 2021, Graeme Geldenhuys via fpc-pascal wrote:

> On 09/03/2021 1:44 am, Tomas Hajny via fpc-pascal wrote:
>> UnicodeString may be used in a program simply because the included unit 
>> has it used in its interface. That may be the case even if there's no 
>> use of characters outside of US ASCII at all.
>
> So FPC rather goes with the fact that data may be *silently* lost during
> encoding conversions? That doesn't seem like a safe default behaviour to
> me.

No, we give the programmer a choice: 
* Not use unicode conversion at all.
* Use the C library to handle conversion (cwstring).
* Use FPC native code to handle conversion (fpwidestring).
* Some other means.

Since the compiler cannot reliably detect that a choice was made, 
it also cannot make the choice for you, because the choice also cannot 
be undone by the compiler.

This mechanism implies the programmer *has* to make that choice.

This is not different from the threading driver mechanism, for which Lazarus adds
some {$IFDEF } mechanisms in the program uses clause.

But, I have been thinking about this. What we can do to alleviate this is the following:

Use the -FaNNN option of the command line.

This option will insert NNN implicitly in the uses clause of the program.

So, we can add 
-Fafpwidestring
or
-Facwstring

in the default generated fpc.cfg config file for selected platforms (mac, linux
i386,64-bit, *bsd). The result will be that a widestring driver unit will be 
inserted by default for those platforms.

By using the necessary IFDEF mechanism in the config file, we can avoid
inserting it for windows (which does not need it) or the smaller embedded platforms
(which cannot handle it).

People that don't need/want this can remove the config setting from the file. 
All the others leave it as-is and will get their desired conversion mechanisms
'for free'.

This way a default choice is made for you on those platforms, but you can still 100% control
it.

Michael.


More information about the fpc-pascal mailing list