[fpc-devel] TStringField, String and UnicodeString and UTF8String

Fri Jan 14 10:53:50 CET 2011

>> So this is answer, which i have looked for:
>> "In Lazarus TStringField MUST hold UTF-8 encoded strings."
>>     
>
> Not entirely true. You could also choose to bind the fields to some
> Lazarus-components manually, not using the db-components.
IMHO most of gui database applications use controls like TDBGrid or TDBEdit
so they should display correct values by default without extra coding 
(or at least provide "some standardized support" ... )

>  (Tedit.Text :=
> convertFunc(StringField.Text)) Or you can add a hook so that the .text
> property always does a conversion to UTF-8. First option can be used if
> you use a mediator or view. Second options I woudn't use.
>
>   
> Rofl. You mean that Microsoft SQL Server can't handle unicode
> completely? 
>   
Completely not, but only UCS-2 (no UTF-8)

>> SQL Server provides non-UNICODE datatypes - char, varchar, text 
>>     
>
> ie: TStringField
>   
Yes, but ODBC driver returns data in ANSI codepage (no possibility to 
force them return UTF-8)
This I can fix by patch in TODBCConnection LoadField like this:
(so I convert to UTF-8 in connector method, when driver is unable return 
UTF-8)
    begin
      Res:=SQLGetData(ODBCCursor.FSTMTHandle, FieldDef.Index+1, 
SQL_C_CHAR, buffer, FieldDef.Size, @StrLenOrInd);
+      if CharSet='ANSI' then //hack for Microsoft SQL Server
+        StrPLCopy(buffer, UTF8Encode(PChar(buffer)), FieldDef.Size);
    end;   

>   
>>  and UNICODE (UCS-2) datatypes - nchar, nvarchar, ntext
>>     
>
> ie: TWideStringField.
>   
Yes, in this case ODBC driver returns data in UCS-2, this data are 
written into "WideString buffer", which seems correct, but in DBGrid are 
displayed "?" instead of characters with diacritical marks (IMHO because 
widestringmanager in Windows converts WideString to ANSI string , not 
UTF-8 string).
This can be fixed by using OnGetText method of field: 
aText:=UTF8Encode(Sender.AsString);
Which is not user friendly, because requires "hacking in user code" in 
every TWideStringField in every TSQLQuery
It can be also fixed in fields.inc:
function TWideStringField.GetAsString: string;
begin
+{$IFDEF WINDOWS}
+  Result := UTF8Encode(GetAsWideString);
+{$ELSE}
  Result := GetAsWideString;
+{$ENDIF}
end;

So what is the expected encoding of data written into TWideStringField 
... or is there way how to get correct results id DBGrid without above 
mentioned workarounds ?

>   
>>  SQL Server ODBC driver supports "AutoTranslate", see:
>> http://msdn.microsoft.com/en-us/library/ms130822.aspx
>>  "SQL Server char, varchar, or text data sent to a client SQL_C_CHAR
>> variable is converted from character to Unicode using the server ACP,
>> then converted from Unicode to character using the client ACP."
>>     
>
> This is what you use when you set the encoding when you connect to the
> client. The solution to all your problems. As explained three times, in
> this message alone.
>
> In fact it's simple: incoming data=outgoing data.
>
> If you need UTF-8 encoding for the outgoing data (direct access to
> Lazarus controls) you have to select UTF-8 at the input.
Yes, but as I wrote such possibility does not exists with Microsoft SQL 
Server (and also I think Access)
(it seems, that Microsoft does not like UTF-8 and prefers UTF-16 (UCS-2))

> And, luckily, you can instruct the Database-server which encoding to use
> when it's communicating with the outer world. So your problem is solved.
>   
When it is possiblem then yes.

> Now, if you also choose UTF-8 as the Database-server field encoding (the
> encoding the data is stored in) there's no conversion necessary at all.
>   
Yes if DB supports UTF-8

-Laco.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-devel/attachments/20110114/4862a901/attachment.html>