[fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"
Hans-Peter Diettrich
DrDiettrich1 at aol.com
Wed Nov 26 19:54:54 CET 2014
Michael Schnell schrieb:
> On 11/26/2014 11:40 AM, Mattias Gaertner wrote:
>> Ansistring supports only one byte per character code pages.
>
> Even more confused. Am I wrong thinking that with code aware Strings,
> for Delphi XE compatibility, in Windows CP_ACP needs to be UTF16 (if not
> right, than due later) ?
Delphi XE does not properly support UTF-8. CP_ACP seems to depend on
western/far-eastern versions, where the western version assumes and
allows for any SBCS; I don't know of the same in far-east versions.
The SBCS restriction allows to simplify standard string handling and
conversions, because every character (=byte) can be exchanged in place.
UTF-8 doesn't fit into this picture, because it's a MBCS.
UTF-16 is not a valid value for CP_ACP in Delphi, because it's a 2-byte
encoding. Even if the Delphi architects may have thought about an common
string type, with a variable element size (1,2,4), this certainly turned
out soon as a stupid idea, so that AnsiString and
WideString/UnicodeString still are strictly distinct types. WideString
and UnicodeString imply UTF-16, with platform specific byte order
(endianness). The latter becomes important almost only to compiler and
library coders, in host/network byteorder conversions. For the sake of
completeness, pdp-11 processors use yet another byte order, maybe more
word-based processors (DG...) as well.
DoDi
More information about the fpc-devel
mailing list