[fpc-pascal] Funny things about utf-8 strings on mac

Jonas Maebe jonas.maebe at elis.ugent.be
Tue Jun 12 09:46:14 CEST 2007


On 12 jun 2007, at 09:28, Felipe Monteiro de Carvalho wrote:

> I edited my source code with TextWrangler (a macintosh text editor),
> setting the encoding to utf-8, and when I opened with Lazarus it would
> show the beginning of the file like this:
>
> Ôªøunit mainform;
>
> Notice the first 3 funny characters (actually on lazarus I see
> different characters, but they changed on copy+paste).

They are the standard marker to identify a file as UTF-8. This is not  
Mac-specific in any way, it's part of the unicode standard.

> (I suppose some kind
> of encoding setting), and why they make utf-8 strings sudenlly stop
> working?

You said things did initially work with the UTF-8 marker in place.  
The default code page used by FPC is 8859-1. However, the scanner  
detects the UTF-8 marker if present, and when it finds it then it  
switches the code page to UTF-8. You can also set the code page  
manually to UTF-8 using {$codepage utf-8}.

The UTF-8 marker maybe got mangled somehow by Lazarus or so. I don't  
know why it worked again afterwards when you removed the marker.


Jonas


More information about the fpc-pascal mailing list