[fpc-devel] utf8 in 2.6.0
Hans-Peter Diettrich
DrDiettrich1 at aol.com
Tue Jan 8 04:45:25 CET 2013
Martin Schreiber schrieb:
> but I fear we can not use that information for development with Free Pascal
> because:
> "
> The string is represented internally as a Unicode string encoded as UTF-16.
> Characters in the Basic Multilingual Plane (BMP) take 2 bytes, and characters
> not in the BMP require 4 bytes.
> "
> and
> "
> A control string is a sequence of one or more control characters, each of
> which consists of the # symbol followed by an unsigned integer constant from
> 0 to 65,535 (decimal) or from $0 to $FFFF (hexadecimal) in UTF-16 encoding,
> and denotes the character corresponding to a specified code value. Each
> integer is represented internally by 2 bytes in the string. This is useful
> for representing control characters and multibyte characters.
> "
> which seems to be different from Free Pascal.
Correction:
You're right, Delphi treats control characters as UTF-16 codes, where
FPC treats them as byte values (if less than 256).
I noticed the possible problem already, that the FPC interpretation of
control characters is context sensitive. This leads to write-only code,
because a change of the $codepage would require to change all control
characters in that unit accordingly. This in addition to the removal or
addition of control characters > 255, which also lead to a different
interpretation of the remaining control characters *and* to a different
internal representation.
DoDi
More information about the fpc-devel
mailing list