[fpc-pascal] CodePage fallback for FreeBSD systems

Jonas Maebe jonas.maebe at elis.ugent.be
Mon Sep 14 13:58:13 CEST 2015


On 10/09/15 14:19, Graeme Geldenhuys wrote:
> See the last file changed.... rtl/unix/unixcp.pp
> It seems it excludes FreeBSD in that $IF statement. Darwin is after all
> a FreeBSD fork.

Darwin's libc is based on FreeBSD's and it shares a few user land 
utilities, but Darwin is far from a FreeBSD fork.

> This results to cwstrings under FreeBSD to default to
> ASCII. :-/

That is how all Unix platforms are defined to behave: if the LANG or 
LC_CTYPE environment variable is not set, the "C" locale is what you 
have to fall back to.

OS X' GUI environment does not follow these conventions, so there we 
have to use another heuristic. The problem with falling back to ASCII 
there, is that the OS X kernel interfaces for file system APIs all use 
UTF-8. This is unlike all other Unix platforms, which don't define any 
encoding whatsoever for file names, and where all file names are simply 
arbitrary arrays of bytes that should be interpreted according to the 
current locale.

If we fall back to ASCII on OS X, then all results from OS file APIs 
will be converted from UTF-8 to ASCII and you get data loss (due to a 
difference between system.DefaultFileSystemCodePage, which is UTF-8, and 
system.DefaultSystemCodePage, which is ASCII). If we fall back to ASCII 
on another unix platform, both system.DefaultFileSystemCodePage and 
system.DefaultSystemCodePage will be ASCII and no data loss occurs (not 
inside the low level file handling routines of the RTL anyway).

That said: I have a very hard time believing that any contemporary Linux 
or *BSD system would not come preconfigured with an UTF-8 locale for 
every user, so you should never end up in the fallback situation unless 
deliberately unset those environment variables yourself.


Jonas




More information about the fpc-pascal mailing list