[fpc-pascal] CodePage fallback for FreeBSD systems
Jonas Maebe
jonas.maebe at elis.ugent.be
Mon Sep 14 13:58:13 CEST 2015
On 10/09/15 14:19, Graeme Geldenhuys wrote:
> See the last file changed.... rtl/unix/unixcp.pp
> It seems it excludes FreeBSD in that $IF statement. Darwin is after all
> a FreeBSD fork.
Darwin's libc is based on FreeBSD's and it shares a few user land
utilities, but Darwin is far from a FreeBSD fork.
> This results to cwstrings under FreeBSD to default to
> ASCII. :-/
That is how all Unix platforms are defined to behave: if the LANG or
LC_CTYPE environment variable is not set, the "C" locale is what you
have to fall back to.
OS X' GUI environment does not follow these conventions, so there we
have to use another heuristic. The problem with falling back to ASCII
there, is that the OS X kernel interfaces for file system APIs all use
UTF-8. This is unlike all other Unix platforms, which don't define any
encoding whatsoever for file names, and where all file names are simply
arbitrary arrays of bytes that should be interpreted according to the
current locale.
If we fall back to ASCII on OS X, then all results from OS file APIs
will be converted from UTF-8 to ASCII and you get data loss (due to a
difference between system.DefaultFileSystemCodePage, which is UTF-8, and
system.DefaultSystemCodePage, which is ASCII). If we fall back to ASCII
on another unix platform, both system.DefaultFileSystemCodePage and
system.DefaultSystemCodePage will be ASCII and no data loss occurs (not
inside the low level file handling routines of the RTL anyway).
That said: I have a very hard time believing that any contemporary Linux
or *BSD system would not come preconfigured with an UTF-8 locale for
every user, so you should never end up in the fallback situation unless
deliberately unset those environment variables yourself.
Jonas
More information about the fpc-pascal
mailing list