[fpc-devel] TField.AsString and Databases with UTF-8 charset
graemeg at opensoft.homeip.net
Fri Jul 24 14:06:21 CEST 2009
Michael Van Canneyt wrote:
>> No, not a second query, just keeping track of extra (other) meta data
>> information which was retrieved from the first API call to Firebird.
> My databases are HUGE, and I don't think that such a query is appropriate.
It's got nothing to do with the size of your database. It is simply
SqlDB that is using the wrong field to report the size of the Char(x)
>> From the Kylix 3 and Delphi 7 documentation:
> Given that neither supports UTF-8, the documentation is not really
> relevant, I'd say.
Lucky for you, Embarcadero now has all its Delphi help available online.
Nothing has changed in Delphi 2009 help. Here are links to D2009's
online help. In both cases, the Size property is referring to characters
and not byte length.
So SqlDB with Firebird is in fact wrong when it returns Size = 8 for a
Char(2) with UTF8 charset enabled.
> I agree that we need a solution, but I'm not convinced your solution
> is correct or even desirable.
Well, Firebird makes no sense regarding it's behaviour. If you have an
UTF-8 encoded string as follows:
s := 'en'; // assume s is a UTF8 String type
What is the length of that string? Firebird would argue that it's 8
bytes. But the Unicode organisation says it's 2 bytes - I tend to agree.
The ASCII character set is represented in UTF-8 and they work as they
did in ASCII. They also only take up 1 byte per character.
Firebird now tells me that the content of the variable s is now
equivalent to "en " when read back from the DB, but it is
definitely not the case. "en" in ASCII or UTF-8 is still only "en"
without the rubbish padding!
PostgreSQL also supports the UTF-8 character set in databases. Surprise,
surprise TParam.Size and TField.Size report the value of Char(x). Also
the return values read from the Char(x) field don't contain any space
padding on the right unless the actual text is less than the Char(x)
definition. Also the character length NEVER exceeds the Char(x) definition.
I'll report this issue to the Firebird developers as well. Whoever
implemented the UTF-8 support in Firebird was a total idiot, and knew
nothing about Unicode.
But in the mean time we can fix the SqlDB issue and work around the
Firebird Char(x) issue as I explained before.
> Don't forget also that for unicode, the number of characters differs
> from the number of bytes. The Firebird API predates this, and so does
Firebird was a total rewrite in C++ for v1.5 or v2 (I can't remember
exactly which). That was pretty recent, so there is no excuse like
legacy code for such crappy Unicode support.
- Graeme -
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
More information about the fpc-devel