[fpc-pascal] RTL and Unicode Strings

Tony Whyman tony.whyman at mccallumwhyman.com
Wed May 11 11:41:36 CEST 2016


On 11/05/16 10:18, Graeme Geldenhuys wrote:
> In my application I enable unicodestring mode. So I'm reading data from
> a Firebird database. The data is stored as UTF-8 in a VarChar field. The
> DB connection is set up as UTF-8.  Now lets assume my FreeBSD box is set
> up with a default encoding of Latin-1.
>
> So I read the UTF-8 data from the database, somewhere inside the SqlDB
> code it gets assigned to a TField's String property. ie: UTF-8 ->
> Latin-1 conversion.

Now this is what interests me as well - in the context of IBX if nothing 
else.

It was news to me yesterday that FPC now stores page code information 
with AnsiStrings and while IBX still works OK with FPC 3.0.0, it should 
work better with this new facility. The IBX code here comes from years 
ago and is:

> function TIBStringField.GetValue(var Value: string): Boolean;
> var
>   Buffer: PChar;
> begin
>   Buffer := nil;
>   IBAlloc(Buffer, 0, Size + 1);
>   try
>     Result := GetData(Buffer);
>     if Result then
>     begin
>       Value := string(Buffer);
>       if Transliterate and (Value <> '') then
>         DataSet.Translate(PChar(Value), PChar(Value), False);
>     end
>   finally
>     FreeMem(Buffer);
>   end;
> end; 
Note the really nasty coercion that comes after the call to 
TField.GetData (which is common to all DB Drivers)  - GetData returns 
untyped data into a buffer. DataSet.Translate is a no-op, and I was 
never sure what purpose it has - if anything.

To make this code play properly with the new AnsiString, it looks like I 
should revise this to (e.g. for utf-8 fields)

   Value := string(Buffer);
   SetCodePage(Value,cp_UTF8,false);
   ...

The outgoing side has a similar problem e.g.

> procedure TIBStringField.SetAsString(const Value: string);
> var
>   Buffer: PChar;
> begin
>   Buffer := nil;
>   IBAlloc(Buffer, 0, Size + 1);
>   try
>     StrLCopy(Buffer, PChar(Value), Size);
>     if Transliterate then
>       DataSet.Translate(Buffer, Buffer, True);
>     SetData(Buffer);
>   finally
>     FreeMem(Buffer);
>   end;
> end; 

This probably needs a

SetCodePage(Value,cp_UTF8,true);

before the StrLCopy.

Anyone know if this is a correct interpretation of the AnsiString 
codepage facility?



More information about the fpc-pascal mailing list