[fpc-devel] String handling in trunk (was utf8 in 2.6.0)

Mon Jan 7 13:28:36 CET 2013

Once upon a time, on 01/07/2013 12:39 PM to be precise, Michael Schnell
said:
> On 01/05/2013 12:28 PM, Jonas Maebe wrote:
>> Using whatever #xx#xx or #xx#xx#xx sequence represents the UTF-8
>> encoding of that character.
> Sorry, I can't follow. Does #xx not just define a numerical
> representation of an 8 bit entity ?
>
> The interpretation in any code might be done later by any code that
> digests the string.
>
> Am I wrong ?
I *think* Jonas is trying to say that if you want the character `Ǿ` in a
string you would either type
- 'Ǿ' or
- #$C7#$BE if you want to keep the source free of encoding specific
characters

You as a programmer make up what you do with it afterwards, if you
decide to write it to an UTF-8 terminal, you would get `Ǿ`, and if you
write it to some other terminal you might see a character that matches
$C7, followed by a character that matches $BE in the lookuptable of the
encoding of the terminal. Look at it this way: the byte sequence ($C7,
$BE) has got no meaning to the compiler whatsoever, it is a byte
sequence. That's what matters to the compiler, what is in this sequence
is for you to decide.

Correct me if I'm wrong.

-- 
Ewald