[fpc-pascal] Funny things about utf-8 strings on mac

Felipe Monteiro de Carvalho felipemonteiro.carvalho at gmail.com
Tue Jun 12 09:28:06 CEST 2007


Hi,

I edited my source code with TextWrangler (a macintosh text editor),
setting the encoding to utf-8, and when I opened with Lazarus it would
show the beginning of the file like this:

Ôªøunit mainform;

Notice the first 3 funny characters (actually on lazarus I see
different characters, but they changed on copy+paste). FPC didn't seam
to care about this, and the file compiled without problems, so I
ignored them.

But then, utf-8 strings stoped working.

This, for example:

procedure TForm1.FormCreate(Sender: TObject);
var
  MyStr: string;
  i: Integer;
begin
  MyStr := 'Texto ł ñ ø ß á';

  WriteLn('[TForm1.FormCreate] Printing string values');

  WriteLn('Length: ', Length(MyStr));

  for i := 1 to Length(MyStr) do
   Write(IntToHex(Integer(MyStr[i]), 2) + ' ');

  WriteLn('');

  Self.Caption := MyStr;
  Label1.Caption := 'átomo tômo não';
  Button1.Caption := 'łñø˘ðßßăŏ';
end;

Would result in non-sense results, like this:

[TForm1.FormCreate] Printing string values
Length: 15
54 65 78 74 6F 20 20 20 20 20 20 20 20 20 20

However, if I remove the 3 funny characters, everything work normally
again, and I see my UTF-8 characters on screen, and the text output
is:

[TForm1.FormCreate] Printing string values
Length: 20
54 65 78 74 6F 20 C5 82 20 C3 B1 20 C3 B8 20 C3 9F 20 C3 A1

Does anyone know what are those funny characters? (I suppose some kind
of encoding setting), and why they make utf-8 strings sudenlly stop
working?

thanks,
-- 
Felipe Monteiro de Carvalho


More information about the fpc-pascal mailing list