[fpc-devel] utf8 in 2.6.0

Hans-Peter Diettrich DrDiettrich1 at aol.com
Tue Jan 8 04:45:25 CET 2013


Martin Schreiber schrieb:

> but I fear we can not use that information for development with Free Pascal 
> because:
> "
> The string is represented internally as a Unicode string encoded as UTF-16. 
> Characters in the Basic Multilingual Plane (BMP) take 2 bytes, and characters 
> not in the BMP require 4 bytes.
> "
> and
> "
> A control string is a sequence of one or more control characters, each of 
> which consists of the # symbol followed by an unsigned integer constant from 
> 0 to 65,535 (decimal) or from $0 to $FFFF (hexadecimal) in UTF-16 encoding, 
> and denotes the character corresponding to a specified code value. Each 
> integer is represented internally by 2 bytes in the string. This is useful 
> for representing control characters and multibyte characters.
> "
> which seems to be different from Free Pascal.

Correction:

You're right, Delphi treats control characters as UTF-16 codes, where 
FPC treats them as byte values (if less than 256).

I noticed the possible problem already, that the FPC interpretation of 
control characters is context sensitive. This leads to write-only code, 
because a change of the $codepage would require to change all control 
characters in that unit accordingly. This in addition to the removal or 
addition of control characters > 255, which also lead to a different 
interpretation of the remaining control characters *and* to a different 
internal representation.

DoDi




More information about the fpc-devel mailing list