[fpc-pascal] RTL and Unicode Strings
Tony Whyman
tony.whyman at mccallumwhyman.com
Wed May 11 11:41:36 CEST 2016
On 11/05/16 10:18, Graeme Geldenhuys wrote:
> In my application I enable unicodestring mode. So I'm reading data from
> a Firebird database. The data is stored as UTF-8 in a VarChar field. The
> DB connection is set up as UTF-8. Now lets assume my FreeBSD box is set
> up with a default encoding of Latin-1.
>
> So I read the UTF-8 data from the database, somewhere inside the SqlDB
> code it gets assigned to a TField's String property. ie: UTF-8 ->
> Latin-1 conversion.
Now this is what interests me as well - in the context of IBX if nothing
else.
It was news to me yesterday that FPC now stores page code information
with AnsiStrings and while IBX still works OK with FPC 3.0.0, it should
work better with this new facility. The IBX code here comes from years
ago and is:
> function TIBStringField.GetValue(var Value: string): Boolean;
> var
> Buffer: PChar;
> begin
> Buffer := nil;
> IBAlloc(Buffer, 0, Size + 1);
> try
> Result := GetData(Buffer);
> if Result then
> begin
> Value := string(Buffer);
> if Transliterate and (Value <> '') then
> DataSet.Translate(PChar(Value), PChar(Value), False);
> end
> finally
> FreeMem(Buffer);
> end;
> end;
Note the really nasty coercion that comes after the call to
TField.GetData (which is common to all DB Drivers) - GetData returns
untyped data into a buffer. DataSet.Translate is a no-op, and I was
never sure what purpose it has - if anything.
To make this code play properly with the new AnsiString, it looks like I
should revise this to (e.g. for utf-8 fields)
Value := string(Buffer);
SetCodePage(Value,cp_UTF8,false);
...
The outgoing side has a similar problem e.g.
> procedure TIBStringField.SetAsString(const Value: string);
> var
> Buffer: PChar;
> begin
> Buffer := nil;
> IBAlloc(Buffer, 0, Size + 1);
> try
> StrLCopy(Buffer, PChar(Value), Size);
> if Transliterate then
> DataSet.Translate(Buffer, Buffer, True);
> SetData(Buffer);
> finally
> FreeMem(Buffer);
> end;
> end;
This probably needs a
SetCodePage(Value,cp_UTF8,true);
before the StrLCopy.
Anyone know if this is a correct interpretation of the AnsiString
codepage facility?
More information about the fpc-pascal
mailing list