<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html;charset=x-mac-ce" http-equiv="Content-Type">

</head>

<body bgcolor="#ffffff" text="#000000">

<br>

<blockquote cite="mid:1294927110.9323.27.camel@wsjoost.cnoc.lan"

 type="cite">

  <blockquote type="cite">

    <pre wrap="">So this is answer, which i have looked for:

"In Lazarus TStringField MUST hold UTF-8 encoded strings."

    </pre>

  </blockquote>

  <pre wrap=""><!---->

Not entirely true. You could also choose to bind the fields to some

Lazarus-components manually, not using the db-components.</pre>

</blockquote>

IMHO most of gui database applications use controls like TDBGrid or

TDBEdit<br>

so they should display correct values by default without extra coding

(or at least provide "some standardized support" ... )<br>

<br>

<br>

<blockquote cite="mid:1294927110.9323.27.camel@wsjoost.cnoc.lan"

 type="cite">

  <pre wrap=""> (Tedit.Text :=

convertFunc(StringField.Text)) Or you can add a hook so that the .text

property always does a conversion to UTF-8. First option can be used if

you use a mediator or view. Second options I woudn't use.

  </pre>

  <pre wrap="">Rofl. You mean that Microsoft SQL Server can't handle unicode

completely? 

  </pre>

</blockquote>

Completely not, but only UCS-2 (no UTF-8)<br>

<br>

<blockquote cite="mid:1294927110.9323.27.camel@wsjoost.cnoc.lan"

 type="cite">

  <blockquote type="cite">

    <pre wrap="">SQL Server provides non-UNICODE datatypes - char, varchar, text 

    </pre>

  </blockquote>

  <pre wrap=""><!---->

ie: TStringField

  </pre>

</blockquote>

Yes, but ODBC driver returns data in ANSI codepage (no possibility to

force them return UTF-8)<br>

This I can fix by patch in TODBCConnection LoadField like this:<br>

(so I convert to UTF-8 in connector method, when driver is unable

return UTF-8)<br>

ÊÊÊ begin<br>

ÊÊÊÊÊ Res:=SQLGetData(ODBCCursor.FSTMTHandle, FieldDef.Index+1,

SQL_C_CHAR, buffer, FieldDef.Size, @StrLenOrInd);<br>

+ÊÊÊÊÊ if CharSet='ANSI' then //hack for Microsoft SQL Server<br>

+ÊÊÊÊÊÊÊ StrPLCopy(buffer, UTF8Encode(PChar(buffer)), FieldDef.Size);<br>

ÊÊÊ end;ÊÊÊ <br>

<br>

<blockquote cite="mid:1294927110.9323.27.camel@wsjoost.cnoc.lan"

 type="cite">

  <pre wrap="">

  </pre>

  <blockquote type="cite">

    <pre wrap=""> and UNICODE (UCS-2) datatypes - nchar, nvarchar, ntext

    </pre>

  </blockquote>

  <pre wrap=""><!---->

ie: TWideStringField.

  </pre>

</blockquote>

Yes, in this case ODBC driver returns data in UCS-2, this data are

written into "WideString buffer", which seems correct, but in DBGrid

are displayed "?" instead of characters with diacritical marks (IMHO

because widestringmanager in Windows converts WideString to ANSI string

, not UTF-8 string).<br>

This can be fixed by using OnGetText method of field:

aText:=UTF8Encode(Sender.AsString);<br>

Which is not user friendly, because requires "hacking in user code" in

every TWideStringField in every TSQLQuery<br>

It can be also fixed in fields.inc:<br>

function TWideStringField.GetAsString: string;<br>

begin<br>

+{$IFDEF WINDOWS}<br>

+Ê Result := UTF8Encode(GetAsWideString);<br>

+{$ELSE}<br>

Ê Result := GetAsWideString;<br>

+{$ENDIF}<br>

end;<br>

<br>

So what is the expected encoding of data written into TWideStringField

... or is there way how to get correct results id DBGrid without above

mentioned workarounds ?<br>

<br>

<blockquote cite="mid:1294927110.9323.27.camel@wsjoost.cnoc.lan"

 type="cite">

  <pre wrap="">

  </pre>

  <blockquote type="cite">

    <pre wrap=""> SQL Server ODBC driver supports "AutoTranslate", see:

<a class="moz-txt-link-freetext" href="http://msdn.microsoft.com/en-us/library/ms130822.aspx">http://msdn.microsoft.com/en-us/library/ms130822.aspx</a>

 "SQL Server char, varchar, or text data sent to a client SQL_C_CHAR

variable is converted from character to Unicode using the server ACP,

then converted from Unicode to character using the client ACP."

    </pre>

  </blockquote>

  <pre wrap=""><!---->

This is what you use when you set the encoding when you connect to the

client. The solution to all your problems. As explained three times, in

this message alone.

In fact it's simple: incoming data=outgoing data.

If you need UTF-8 encoding for the outgoing data (direct access to

Lazarus controls) you have to select UTF-8 at the input.</pre>

</blockquote>

Yes, but as I wrote such possibility does not exists with Microsoft SQL

Server (and also I think Access)<br>

(it seems, that Microsoft does not like UTF-8 and prefers UTF-16

(UCS-2))<br>

<br>

<blockquote cite="mid:1294927110.9323.27.camel@wsjoost.cnoc.lan"

 type="cite">

  <pre wrap="">And, luckily, you can instruct the Database-server which encoding to use

when it's communicating with the outer world. So your problem is solved.

  </pre>

</blockquote>

When it is possiblem then yes.<br>

<br>

<blockquote cite="mid:1294927110.9323.27.camel@wsjoost.cnoc.lan"

 type="cite">

  <pre wrap="">

Now, if you also choose UTF-8 as the Database-server field encoding (the

encoding the data is stored in) there's no conversion necessary at all.

  </pre>

</blockquote>

Yes if DB supports UTF-8<br>

<br>

-Laco.<br>

<br>

</body>

</html>