[fpc-devel] utf8 in 2.6.0

Jonas Maebe jonas.maebe at elis.ugent.be
Sat Jan 5 11:30:42 CET 2013


On 05 Jan 2013, at 10:29, Martin Schreiber wrote:

> Are these stupid questions?

No, but I seem to be unable to explain how it works since you keep asking about things I already tried to explain before, but I clearly failed to do properly. I can keep repeating myself, but I'm not sure whether that will help anyone.

For example, I said that basically nothing changed in 2.7.x compared to 2.6.x, except that certain string constants are no longer automatically converted to utf-16 at compile time, and then you ask "Or should we not touch the theme strings and FPC anymore?". Since basically nothing changed except for a few less blind auto-conversions at compile time, why should you no longer be able to touch anything anymore?

Let me repeat: your string constants will be parsed by the compiler into character sequences with exactly the same content in both 2.6.x and 2.7.x (and with content I mean that if they would be converted to the same code page in 2.6.x and in 2.7.x, you would end up with exactly the same binary data). Whether or not they contain character literals whose value is >#127 in the source code's code page, or explicit "#xx", "#xxx" etc expressions has no influence, nothing changed in the compiler in that account.

The *only* difference is that the compiler can now internally represent ansistrings with arbitrary code pages, and as a result the aforementioned character sequences may now be stored internally in the compiler in a different format, and also stored in the program in a different format if that can avoid conversions at run time. In particular, character sequences are no longer all converted immediately/by default/under all circumstances to UTF-16 in case characters >#127 need to be interpreted according to a particular code page (i.e., if a {$codepage xxx} directive is present).

The compiler will now only convert such character sequences to UTF-16, still at compile time (just like it was able to do in 2.6.x), if it is actually assigned to an UTF-16-encoded string, passed to an UTF-16 parameter etc. And the compiler will also convert it to another ansistring code page is case the character sequence appeared in e.g. a file with {$codepage utf-8} and is then assigned to a variable whose type is declared as "type ansistring(850)".


Jonas


More information about the fpc-devel mailing list