[fpc-devel] String handling in trunk (was utf8 in 2.6.0)

Jonas Maebe jonas.maebe at elis.ugent.be
Sat Jan 5 13:34:04 CET 2013


On 05 Jan 2013, at 13:33, Martin Schreiber wrote:

> On Saturday 05 January 2013 12:57:44 Jonas Maebe wrote:
>> On 05 Jan 2013, at 12:53, Martin Schreiber wrote:
>>> So compiled with -Fcutf8
>>> "
>>> unicodestringvar:= 'Best'#228'tigung';
>>> "
>>> produces a different result on fixes_2_6 and trunk? I assume in trunk
>>> there will be a compile error?
>> 
>> No. In both cases it results in a widestring with this content:
>> 
>> .short	66,101,115,116,228,116,105,103,117,110,103,0
>> 
>> I guess invalid utf-8 values are just copied through by the compiler. As
>> mentioned: absolutely nothing whatsoever changed in how character sequences
>> are interpreted by the compiler in 2.7.x. The explanation you quoted above
>> (and which I deleted) applies to both 2.6.x and 2.7.x. I really don't know
>> how I can say this in another way, and repeating it clearly doesn't help.
>> 
>> I think it's best if you compile trunk for yourself and test as many
>> scenarios as you can, because I feel I cannot add anything further to the
>> discussion, and I'm not interested in playing compile bot.
>> 
> Then it was a misunderstanding again

No, it was simply an omission in my explanation. As mentioned above: "I guess invalid utf-8 values are just copied through by the compiler". It's a special case, but the special case is the same in 2.6.x and 2.7.x (2.6.x converts the UTF-8 string to UTF-16 immediately in the scanner, while 2.7.x does it while processing the assignment; the actual conversion code that's used is however exactly same). The fact that everything remains 100% the same in all cases everywhere always between 2.6.x and 2.7.x has been mentioned at least 10 times in this thread, and that's what I keep trying to make clear. But I give up.


Jonas


More information about the fpc-devel mailing list