[fpc-devel] TField.AsString and Databases with UTF-8 charset

Graeme Geldenhuys graemeg at opensoft.homeip.net
Fri Jul 24 16:15:07 CEST 2009


Michael Van Canneyt wrote:
> 
> Which field should it use according to you then ?

"f.rdb$character_length" to report TField.Size and TParam.Size
See below...


>> So SqlDB with Firebird is in fact wrong when it returns Size = 8
>> for a Char(2) with UTF8 charset enabled.
> 
> Yes, but assume that a size of 2 is returned. This means a buffer of
> 2 bytes (in ansistring byte=character) will be reserved for the data.


OK Michael, you are confusing what TField.Size means. You also don't 
seem to take into account TField.DataSize. See the following URL.

http://docs.embarcadero.com/products/rad_studio/delphiAndcpp2009/HelpUpdate2/EN/html/delphivclwin32/DB_TField_DataSize.html

TField.Size and TParam.Size report back the x number of "characters" 
irrespective of what character set is being used. This value should be 
the same as the Char(x) type definition.

TField.DataSize reports back the amount of bytes required to store the 
value.

Example 1.
-----------
   Data stored is "en" in a field defined as Char(2) with UTF8 charset.
      UTF-8: 0x65
      UTF-8: 0x6E
   TField.Size must equal 2
   TField.DataSize must equal 2     (1 byte per character)


Example 2.
-----------
   Data stored is "豈更" in a field defined as Char(2) with UTF8 charset.
      UTF-8: 0xEF 0xA4 0x80
      UTF-8: 0xEF 0xA4 0x81
   TField.Size must equal 2
   TField.DataSize must equal 6     (3 bytes per character)


Example 3.
-----------
   Data stored is "e" in a field defined as Char(2) with UTF8 charset.
      UTF-8: 0x65
      UTF-8: 0x20     // 1 space character for padding
   TField.Size must equal 2
   TField.DataSize must equal 2     (1 byte per character)


> What happens if some strange unicode string of 4 or even 8 bytes is 
> returned by Firebird ? A Buffer overflow...

In that case SqlDB is not using TField.DataSize like it is supposed to.


> So SQLDB "agrees with firebird" and reserves 8 bytes because that is
> the max what can be returned.

Why to use the TField.DataSize to reserve the correct about of bytes.


> But the padding is added by SQLDB, not by firebird.

Oh!  It seems that FBLib also makes the same mistake as SqlDB then. I 
tested on both SqlDB and FBLib and they behaved the same, so assumed the 
fault was at Firebird. It seems to me, then it is more a general Object 
Pascal implementation mistake.  Luckily I haven't posted a bug report to 
Firebird then.

I want to try using C source code, but I'm to rusty in C. I couldn't 
even get my test program to compile. :-)


> The problem is deeper than you see, and is not related to SQLDb, but
> to the implicit assumption in TBufDataset that for TStringField, 1
> char = 1 byte:

I think it's more a case of TField.DataSize not being taken into 
account, and always assumes TField.Size and TField.DataSize are the same 
for Char(x) field definitions.


> As a consequence, my prediction is that, because it reports a size in
> characters, the postgres implementation will suffer of buffer
> overflows as soon as strange (=multibyte) unicode characters are

Just did a test. PostgreSQL reports back the correct TField.Size, but 
somewhere the content is being clipped. I ran this through a modified 
tiOPF with SqlDB_PG persistence layer.

I'll make a cleaner example that only uses SqlDB to confirm this issue.

==============================
There was 1 failure:

   1) textrunner.SQL Database tests.TTestCountry.TestCountry_ReadList: 
ETestFailure
      at $0807FA56
       "Check #3: Failed on ID
Expected:
"豈更"
But was:
"豈e2""
==============================


Regards,
   - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://opensoft.homeip.net/fpgui/




More information about the fpc-devel mailing list