<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=x-mac-ce" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
<br>
<blockquote cite="mid:1294927110.9323.27.camel@wsjoost.cnoc.lan"
type="cite">
<blockquote type="cite">
<pre wrap="">So this is answer, which i have looked for:
"In Lazarus TStringField MUST hold UTF-8 encoded strings."
</pre>
</blockquote>
<pre wrap=""><!---->
Not entirely true. You could also choose to bind the fields to some
Lazarus-components manually, not using the db-components.</pre>
</blockquote>
IMHO most of gui database applications use controls like TDBGrid or
TDBEdit<br>
so they should display correct values by default without extra coding
(or at least provide "some standardized support" ... )<br>
<br>
<br>
<blockquote cite="mid:1294927110.9323.27.camel@wsjoost.cnoc.lan"
type="cite">
<pre wrap=""> (Tedit.Text :=
convertFunc(StringField.Text)) Or you can add a hook so that the .text
property always does a conversion to UTF-8. First option can be used if
you use a mediator or view. Second options I woudn't use.
</pre>
<pre wrap="">Rofl. You mean that Microsoft SQL Server can't handle unicode
completely?
</pre>
</blockquote>
Completely not, but only UCS-2 (no UTF-8)<br>
<br>
<blockquote cite="mid:1294927110.9323.27.camel@wsjoost.cnoc.lan"
type="cite">
<blockquote type="cite">
<pre wrap="">SQL Server provides non-UNICODE datatypes - char, varchar, text
</pre>
</blockquote>
<pre wrap=""><!---->
ie: TStringField
</pre>
</blockquote>
Yes, but ODBC driver returns data in ANSI codepage (no possibility to
force them return UTF-8)<br>
This I can fix by patch in TODBCConnection LoadField like this:<br>
(so I convert to UTF-8 in connector method, when driver is unable
return UTF-8)<br>
ΚΚΚ begin<br>
ΚΚΚΚΚ Res:=SQLGetData(ODBCCursor.FSTMTHandle, FieldDef.Index+1,
SQL_C_CHAR, buffer, FieldDef.Size, @StrLenOrInd);<br>
+ΚΚΚΚΚ if CharSet='ANSI' then //hack for Microsoft SQL Server<br>
+ΚΚΚΚΚΚΚ StrPLCopy(buffer, UTF8Encode(PChar(buffer)), FieldDef.Size);<br>
ΚΚΚ end;ΚΚΚ <br>
<br>
<blockquote cite="mid:1294927110.9323.27.camel@wsjoost.cnoc.lan"
type="cite">
<pre wrap="">
</pre>
<blockquote type="cite">
<pre wrap=""> and UNICODE (UCS-2) datatypes - nchar, nvarchar, ntext
</pre>
</blockquote>
<pre wrap=""><!---->
ie: TWideStringField.
</pre>
</blockquote>
Yes, in this case ODBC driver returns data in UCS-2, this data are
written into "WideString buffer", which seems correct, but in DBGrid
are displayed "?" instead of characters with diacritical marks (IMHO
because widestringmanager in Windows converts WideString to ANSI string
, not UTF-8 string).<br>
This can be fixed by using OnGetText method of field:
aText:=UTF8Encode(Sender.AsString);<br>
Which is not user friendly, because requires "hacking in user code" in
every TWideStringField in every TSQLQuery<br>
It can be also fixed in fields.inc:<br>
function TWideStringField.GetAsString: string;<br>
begin<br>
+{$IFDEF WINDOWS}<br>
+Κ Result := UTF8Encode(GetAsWideString);<br>
+{$ELSE}<br>
Κ Result := GetAsWideString;<br>
+{$ENDIF}<br>
end;<br>
<br>
So what is the expected encoding of data written into TWideStringField
... or is there way how to get correct results id DBGrid without above
mentioned workarounds ?<br>
<br>
<blockquote cite="mid:1294927110.9323.27.camel@wsjoost.cnoc.lan"
type="cite">
<pre wrap="">
</pre>
<blockquote type="cite">
<pre wrap=""> SQL Server ODBC driver supports "AutoTranslate", see:
<a class="moz-txt-link-freetext" href="http://msdn.microsoft.com/en-us/library/ms130822.aspx">http://msdn.microsoft.com/en-us/library/ms130822.aspx</a>
"SQL Server char, varchar, or text data sent to a client SQL_C_CHAR
variable is converted from character to Unicode using the server ACP,
then converted from Unicode to character using the client ACP."
</pre>
</blockquote>
<pre wrap=""><!---->
This is what you use when you set the encoding when you connect to the
client. The solution to all your problems. As explained three times, in
this message alone.
In fact it's simple: incoming data=outgoing data.
If you need UTF-8 encoding for the outgoing data (direct access to
Lazarus controls) you have to select UTF-8 at the input.</pre>
</blockquote>
Yes, but as I wrote such possibility does not exists with Microsoft SQL
Server (and also I think Access)<br>
(it seems, that Microsoft does not like UTF-8 and prefers UTF-16
(UCS-2))<br>
<br>
<blockquote cite="mid:1294927110.9323.27.camel@wsjoost.cnoc.lan"
type="cite">
<pre wrap="">And, luckily, you can instruct the Database-server which encoding to use
when it's communicating with the outer world. So your problem is solved.
</pre>
</blockquote>
When it is possiblem then yes.<br>
<br>
<blockquote cite="mid:1294927110.9323.27.camel@wsjoost.cnoc.lan"
type="cite">
<pre wrap="">
Now, if you also choose UTF-8 as the Database-server field encoding (the
encoding the data is stored in) there's no conversion necessary at all.
</pre>
</blockquote>
Yes if DB supports UTF-8<br>
<br>
-Laco.<br>
<br>
</body>
</html>