[fpc-pascal] FileIO in FPC 3.0
Jonas Maebe
jonas.maebe at elis.ugent.be
Fri Sep 25 10:35:47 CEST 2015
Andreas Dorn wrote on Fri, 25 Sep 2015:
> In the discussion about resourcestrings I read that the RTL now uses
> codepage-aware strings for FileIO.
> So I wonder what kind of codepages do you use for FileIO?
On Windows: UTF-16.
> The Windows-documentation calls Filenames "opaque sequence of WCHARs".
> https://msdn.microsoft.com/en-us/library/windows/desktop/aa365247%28v=vs.85%29.aspx
>
> So e.g. converting a Filename from the Windows-API to UTF-8 can be lossy.
> Does the new FPC-FileApi work correctly if a Filename contains
> invalid UTF-16 sequences?
If you use the RTL file APIs with unicodestrings on Windows, then no
conversions should occur because
a) we use the UTF-16 Windows APIs
b) all file name helpers are available both with unicodestring and
rawbytestring parameters, so the unicodestring ones should be used.
If you use the RTL file APIs with an ansistring variant, then which
code page is used is described at
http://wiki.freepascal.org/FPC_Unicode_support#Code_page_settings
Maybe we should add support for detecting invalid UTF-16 sequences in
returned file names from Windows APIs, and if there are any ask for
and return the "short/safe name" instead (file~1.txt and the like).
For data that you pass in yourself, there is no problem (either you
pass in UTF-16 and it will be passed on unmodified, or you use another
code page and then it's your responsibility if it contains invalid
data -- which pretty much only can happen with UTF-8, and possibly
some single byte code pages that have undefined bytes, if there are
any).
> Assigning a codepage to something that basically is just some raw
> sequence of bytes from an
> external source sounds dangerous to me.
It is.
Jonas
More information about the fpc-pascal
mailing list