[fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

Michael Van Canneyt michael at freepascal.org
Wed May 11 14:07:48 CEST 2016



On Wed, 11 May 2016, Graeme Geldenhuys wrote:

> Hi,
>
> Here is an example [proof if you will] of the problem. I wrote a small
> test program that reads data from a Firebird database where the database
> and field charset is set to UTF8.
>
> I compile the program, then run it. No recompiles between the two runs.
> The first run my system is set to have a UTF-8 locale. The second run is
> where I set my system to have a ISO8859-1 (Latin-1) locale. The program
> outputs the DefaultSystemCodePage to the console.
>
> Because the locale changes the behaviour of String (aka AnsiString) in
> the RTL and FCL, the first run works, but the second run corrupts my data.
>
> Console output:
>
> [unicode_test]$ export LANG=en_US.UTF-8
> [unicode_test]$ ./unicodetest
> 65001
>
> [unicode_test]$ export LANG=en_US.ISO8859-1
> [unicode_test]$ ./unicodetest
> 28591
>
>
> In my test program I write the data read from the database to a file
> using TFileStream, thus console and file encoding settings will not
> affect the data being written to file. TFileStream is simply writing bytes.

But what does your program prove ?

You're only proving that a conversion happens when you do
s := fieldByName('somefield').asString;
and that the conversion takes into account the locale, which in one of the 2
runs is different from the actual locale data in the database.

This conversion is as-designed, and known to be wrong in the case of TField.AsString, 
but will not be solved by simply using {$modeswitch unicodestring} in the database code.

AFAIK 3.0 is no different in this matter from 2.6.4, Jonas can confirm/deny. 
Unlike 2.6.4, 3.0.0 offers us the possibility to fix it by allowing to specify 
the codepage in TField. This is not yet implemented, however.

Michael.



More information about the fpc-pascal mailing list