[fpc-devel] TField.AsString and Databases with UTF-8 charset

Graeme Geldenhuys graemeg at opensoft.homeip.net
Fri Jul 24 12:49:51 CEST 2009


Michael Van Canneyt wrote:
> 
>> That way, SqlDB can return copy(fieldvaluestring, 0, character_len)
>> as the actual field text value, which trims off the padding of
>> spaces.
> 
> If you look carefully, you'll see that the padding of spaces happens
> in code in the case of a CHAR field. Maybe we should do something
> about that.


I'm fine with padding for CHAR() field types if the content is less than 
the Char() length.

eg:  Char(5) field definition should always return 5 characters 
irrespective if you insert only 2 character of data.

The UTF8 implementation is greatly flawed in Firebird. It does not 
adhere to the max character length as defined by Char(x).

The UTF-8 value of "en" IS "en" because UTF-8 is a variable byte length 
implementation. Plus the first 254 (there abouts) characters in UTF-8 
only take up 1 byte per character. The UTF-8 encoded string of "en" in 
NOT "en      " like Firebird is returning!


In Summary:
-------------
There are two problems here.

1) SqlDB and Firebird are reporting the wrong TParam.Size and 
TField.Size results. SqlDB is using byte length instead of character length.

2) The UTF-8 implementation of Firebird is seriously flawed. Firebird 
makes as if UTF-8 is a fixed byte algorithm and just returns rubbish 
results from a Char(x) field and breaks the DDL rule of what the maximum 
character length is. Using the metadata, SqlDB *can* fixes this by using 
something like:  copy(fieldvalue, 0, MaxCharacterLength)


Regards,
   - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://opensoft.homeip.net/fpgui/




More information about the fpc-devel mailing list