[fpc-devel] String handling in trunk (was utf8 in 2.6.0)

Jonas Maebe jonas.maebe at elis.ugent.be
Sat Jan 5 12:28:03 CET 2013


On 05 Jan 2013, at 12:12, Martin Schreiber wrote:

> Thank you very much for the detailed explanation. What I could not found in 
> all the answers (probably it is my ignorance of the English language), is, 
> does #n mean a utf16 code unit as in Delphi XE3 or does it denote something 
> other?

It was not in the explanation, because it is something that did not change between 2.6.x and 2.7.x. Whatever you use in 2.6.x will still work in exactly the same way in 2.7.x. The Delphi XE3 behaviour may be added to the {$mode delphiunicode} syntax mode, but has not yet been implemented and will never be applied to existing syntax modes.

> Assume {$codepage utf-8} how should we enter Russian character constants in #n 
> form?

Using whatever #xx#xx or #xx#xx#xx sequence represents the UTF-8 encoding of that character.

> How should we enter Russian character constants in #n form if 
> {$codepage 8859-5} is defined?

Using whatever #xx represents that character in code page 8859-5.

Alternatively, in both cases you can instead define a unicodestring/widestring constant instead of an ansistring/shortstring constant by embedding widechar constants in the character sequence. Such widechar constants are of the form #<number> with <number> a valid Pascal representation of an integer constant between 255 and 65535. Then you can use those widechars to represent the desired characters as UTF-16 code points. In that case, the entire string will however be parsed as a sequence of UTF-16 code points (because a string is either a sequence of ansichars, or a sequence of widechars; it can never be a mixture of the two), and hence also #1 or #128 appearing in a widestring will be parsed as widechar(#1) and widechar(#128) as opposed to being interpreted according to the current codepage setting. 

> And again, sorry for the impertinence, how do resource strings fit in the 
> string handling scenario of Free Pascal trunk?

Unicode support for resourcestrings is still not available in FPC trunk. They can currently still only be used safely for ASCII content.


Jonas


More information about the fpc-devel mailing list