[fpc-pascal] FileIO in FPC 3.0

Jonas Maebe jonas.maebe at elis.ugent.be
Fri Sep 25 10:35:47 CEST 2015


Andreas Dorn wrote on Fri, 25 Sep 2015:

> In the discussion about resourcestrings I read that the RTL now uses  
> codepage-aware strings for FileIO.
> So I wonder what kind of codepages do you use for FileIO?

On Windows: UTF-16.

> The Windows-documentation calls Filenames "opaque sequence of WCHARs".
> https://msdn.microsoft.com/en-us/library/windows/desktop/aa365247%28v=vs.85%29.aspx
>
> So e.g. converting a Filename from the Windows-API to UTF-8 can be lossy.
> Does the new FPC-FileApi work correctly if a Filename contains  
> invalid UTF-16 sequences? 

If you use the RTL file APIs with unicodestrings on Windows, then no  
conversions should occur because
a) we use the UTF-16 Windows APIs
b) all file name helpers are available both with unicodestring and  
rawbytestring parameters, so the unicodestring ones should be used.

If you use the RTL file APIs with an ansistring variant, then which  
code page is used is described at  
http://wiki.freepascal.org/FPC_Unicode_support#Code_page_settings

Maybe we should add support for detecting invalid UTF-16 sequences in  
returned file names from Windows APIs, and if there are any ask for  
and return the "short/safe name" instead (file~1.txt and the like).  
For data that you pass in yourself, there is no problem (either you  
pass in UTF-16 and it will be passed on unmodified, or you use another  
code page and then it's your responsibility if it contains invalid  
data -- which pretty much only can happen with UTF-8, and possibly  
some single byte code pages that have undefined bytes, if there are  
any).

> Assigning a codepage to something that basically is just some raw  
> sequence of bytes from an
> external source sounds dangerous to me.

It is.


Jonas




More information about the fpc-pascal mailing list