[fpc-devel] String handling in trunk (was utf8 in 2.6.0)
Mark Morgan Lloyd
markMLl.fpc-devel at telemetry.co.uk
Mon Jan 7 18:05:04 CET 2013
Tomas Hajny wrote:
> On Mon, January 7, 2013 13:28, Ewald wrote:
>> Once upon a time, on 01/07/2013 12:39 PM to be precise, Michael Schnell
>> said:
>>> On 01/05/2013 12:28 PM, Jonas Maebe wrote:
>>>> Using whatever #xx#xx or #xx#xx#xx sequence represents the UTF-8
>>>> encoding of that character.
>>> Sorry, I can't follow. Does #xx not just define a numerical
>>> representation of an 8 bit entity ?
>>>
>>> The interpretation in any code might be done later by any code that
>>> digests the string.
>>>
>>> Am I wrong ?
>> I *think* Jonas is trying to say that if you want the character `Ǿ` in a
>> string you would either type
>> - 'Ǿ' or
>> - #$C7#$BE if you want to keep the source free of encoding specific
>> characters
> .
> .
>
> ...or
> - #$01FE and then the whole string becomes a Unicode string which is
> either kept that way (if it is assigned to a UnicodeString constant), or
> it is converted to some 8-bit encoding at compile time (if it is assigned
> to an 8-bit constant/variable like ansistring)
>
> (also just my understanding of what Jonas wrote)
That's how I read it as well. In which case, is #A3 16-bit Unicode
(representing the UK £ Sterling) or malformed UTF-8 (should be #c2#a3)?
--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk
[Opinions above are the author's, not those of his employers or colleagues]
More information about the fpc-devel
mailing list