[fpc-devel] String handling in trunk (was utf8 in 2.6.0)

Mon Jan 7 18:05:04 CET 2013

Tomas Hajny wrote:
> On Mon, January 7, 2013 13:28, Ewald wrote:
>> Once upon a time, on 01/07/2013 12:39 PM to be precise, Michael Schnell
>> said:
>>> On 01/05/2013 12:28 PM, Jonas Maebe wrote:
>>>> Using whatever #xx#xx or #xx#xx#xx sequence represents the UTF-8
>>>> encoding of that character.
>>> Sorry, I can't follow. Does #xx not just define a numerical
>>> representation of an 8 bit entity ?
>>>
>>> The interpretation in any code might be done later by any code that
>>> digests the string.
>>>
>>> Am I wrong ?
>> I *think* Jonas is trying to say that if you want the character `Ǿ` in a
>> string you would either type
>> - 'Ǿ' or
>> - #$C7#$BE if you want to keep the source free of encoding specific
>> characters
>  .
>  .
> 
> ...or
> - #$01FE and then the whole string becomes a Unicode string which is
> either kept that way (if it is assigned to a UnicodeString constant), or
> it is converted to some 8-bit encoding at compile time (if it is assigned
> to an 8-bit constant/variable like ansistring)
> 
> (also just my understanding of what Jonas wrote)

That's how I read it as well. In which case, is #A3 16-bit Unicode 
(representing the UK £ Sterling) or malformed UTF-8 (should be #c2#a3)?

-- 
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]