[fpc-devel] String handling in trunk (was utf8 in 2.6.0)

Martin Schreiber mse00000 at gmail.com
Sat Jan 5 12:12:46 CET 2013


On Saturday 05 January 2013 11:30:42 Jonas Maebe wrote:
[...]
>
> For example, I said that basically nothing changed in 2.7.x compared to
> 2.6.x, except that certain string constants are no longer automatically
> converted to utf-16 at compile time, and then you ask "Or should we not
> touch the theme strings and FPC anymore?". Since basically nothing changed
> except for a few less blind auto-conversions at compile time, why should
> you no longer be able to touch anything anymore?
>
> Let me repeat: your string constants will be parsed by the compiler into
> character sequences with exactly the same content in both 2.6.x and 2.7.x
> (and with content I mean that if they would be converted to the same code
> page in 2.6.x and in 2.7.x, you would end up with exactly the same binary
> data). Whether or not they contain character literals whose value is >#127
> in the source code's code page, or explicit "#xx", "#xxx" etc expressions
> has no influence, nothing changed in the compiler in that account.
>
> The *only* difference is that the compiler can now internally represent
> ansistrings with arbitrary code pages, and as a result the aforementioned
> character sequences may now be stored internally in the compiler in a
> different format, and also stored in the program in a different format if
> that can avoid conversions at run time. In particular, character sequences
> are no longer all converted immediately/by default/under all circumstances
> to UTF-16 in case characters >#127 need to be interpreted according to a
> particular code page (i.e., if a {$codepage xxx} directive is present).
>
> The compiler will now only convert such character sequences to UTF-16,
> still at compile time (just like it was able to do in 2.6.x), if it is
> actually assigned to an UTF-16-encoded string, passed to an UTF-16
> parameter etc. And the compiler will also convert it to another ansistring
> code page is case the character sequence appeared in e.g. a file with
> {$codepage utf-8} and is then assigned to a variable whose type is declared
> as "type ansistring(850)".
>
Thank you very much for the detailed explanation. What I could not found in 
all the answers (probably it is my ignorance of the English language), is, 
does #n mean a utf16 code unit as in Delphi XE3 or does it denote something 
other? You write:

> Whether or not they contain character literals whose value is >#127
> in the source code's code page, or explicit "#xx", "#xxx" etc expressions
> has no influence, nothing changed in the compiler in that account.

Assume {$codepage utf-8} how should we enter Russian character constants in #n 
form? How should we enter Russian character constants in #n form if 
{$codepage 8859-5} is defined?
And again, sorry for the impertinence, how do resource strings fit in the 
string handling scenario of Free Pascal trunk?

Martin



More information about the fpc-devel mailing list