[fpc-devel] String and UnicodeString and UTF8Stringt

Joost van der Sluis joost at cnoc.nl
Wed Jan 12 14:33:25 CET 2011


On Wed, 2011-01-12 at 09:45 +0100, LacaK wrote:
> Sven Barth  wrote / napĂ­sal(a):
> > Am 12.01.2011 07:16, schrieb LacaK:
> >> P.S. I still does not understand, how can things work correctly if LCL
> >> expect that all AnsiStrings (String) are UTF8Strings, byt RTL/FCL does
> >> not strictly follow this (at least in Windows) ?
> >
> > LCL uses SysToUTF8 and UTF8ToSys if it uses the RTL (and the FCL). 
> > This is often done with wrappers that wrap the RTL method and do the 
> > conversion (e.g. FileExistsUTF8, etc.).
> As I wrote in any of my previous message, AFAIK this is not true in case 
> of "fcl-db" and Lazarus data-aware components like TDBGrid, TDBEdit ...
> They use "TField.Text: String" property to get string conent of field 
> and display them.
> AFAIU LCL expects, that TField.Text will always return UTF-8 encoded 
> string (because no conversion (SysToUTF8) is done in dbgrids.pas or 
> dbedit.inc) , but this is not true always.
> 
> So where is error ?
> 1. Is it wrong expectation by LCL, that TField.Text is always UTF8 string
> -or-
> 2. Is it wrong in implementation of TSQLConnectors, which write data 
> into record buffer (of TStringField) and do not convert them always into 
> UTF-8 ?
> (if data should be always in UTF-8 then it will be good redefine 
> TField.Text property like "property Text: UTF8String" to be clear, that 
> we always work with UTF-8 strings)
> -or
> 3. I missed something ? ;-)

Didn't I explain this to you and others a few times?

The database-components itself are encoding-agnostic. This means:
encoding in = encoding out.

So it is up to the developer what codepage he want to use. So
TField.Text can have the encoding _you_ want.

So, if you want to work with Lazarus, which uses UTF-8, you have to use
UTF-8 encoded strings in your database. 

If there is some strange reason why you don't want the strings in your
database to be UTF-8 encoded, you have to convert the strings from the
encoding your database uses to UTF-8 while reading data from the
database.

Luckily, you can specify the encoding of strings you want to use for
most databases. Not only the encoding in which the strings are stored,
but also the encoding which has to be used when you send and retrieve
data from the database. And you can set this for each connection made.

Ie: you can resolve the problem by changing the connection-string, or by
adding some connection-parameter.

There's also another solution you can find on the forum and other
places. You can convert the strings to UTF-8 not only when they are read
from the database, but also when they are read from the internal memory.
There's a hook for that.

Joost.




More information about the fpc-devel mailing list