[fpc-devel] cpstrrtl/unicode branch merged to trunk

Jonas Maebe jonas.maebe at elis.ugent.be
Fri Sep 6 13:09:28 CEST 2013


I've just merged the cpstrrtl/unicode branch into trunk. Below you can find the commit message, which describes most changes, the added features and also a very important warning.


  o merged cpstrrtl branch (includes unicode branch). In general, this adds
    support for arbitrarily encoded ansistrings to many routines related to
    file system access (and some others).
  WARNING: while the parameters of many routines have been changed from
    "ansistring" to "rawbytestring" to avoid data loss due to conversions,
    this is not a panacea. If you pass a string concatenation to such a
    parameter and not all strings in this concatenation have the same
    code page, all strings and the result will be converted to
    DefaultSystemCodePage (= ansi code page by default). In particular,
    concatenating e.g. an Utf8String with a constant string and passing
    the result to a RawByteString parameter will convert the result into
    the DefaultSystemCodePage (unless the source code is compiler with
    {$modeswitch systemcodepage} or {$mode delphiunicode} *and* the ansi
    code page on the system you are compiling *on* happens to be UTF-8)
    You can define and use alternative routines that explicitly accept
    Utf8String parameters to avoid this pitfall. Internally, all of these
    routines ensure that they never trigger this condition and ensure that
    not unnecessary/unwanted code page conversions occur.

  + DefaultFileSystemCodePage variable that holds the code page used for
    communicating with the OS single byte file system APIs, and for the
    strings returned by those same APIs. Initialized with
   o the result of GetACP in the system unit of Windows platforms, except for
     WinCE which uses UTF-8 since its file system OS API calls already use
     the UTF-16 versions
   o CP_UTF8 on Unix platforms with FPCRTL_FILESYSTEM_UTF8 defined, and with
     DefaultSystemCodePage on other Unix platforms
   o DefaultSystemCodePage on Java/Android JVM targets
  + DefaultRTLFileSystemCodePage variable that holds the code page used to
    encode strings returned by RTL routines that return filenames obtained
    from OS API calls. By default the same as DefaultFileSystemCodePage on
    all platforms. Separate from DefaultFileSystemCodePage for clarity on
    platforms that may use either utf-16 or single byte OS API calls to
    send/receive file names (such as most Windows platforms)
  + new scpFileSystemSingleByte enum that can be passed to
    GetStandardCodePage() to get the default code page for OS single byte file
    system APIs, with implementations for Unix and Windows
  + SetMultiByteFileSystemCodePage() procedure to override the value of
  + ToSingleByteFileSystemEncodedFileName() function to convert a string to to
    DefaultFileSystemCodePage (does *not* take care of OS-specific quirks like
    Darwin always returning file names in decomposed UTF-8)
  + support for CP_OEMCP
  * textrec/filerec now store the filename by default using widechar. It is
    possible to switch back to ansichars using the FPC_ANSI_TEXTFILEREC define.
    In that case, from now on the filename will always be stored in
  * fixed potential buffer overflows and non-null-terminated file names in

  * when concatenating ansistrings, do not map CP_NONE (rawbytestring) to
    CP_ACP (defaultsystemcodepage), because if all input strings have the
    same code page then the result should also have that code page if it's
    assigned to a rawbytestring rather than getting defaultsystemcodepage
  * do not consider empty strings to determine the code page of the result
    in fpc_AnsiStr_Concat_multi(), because that will cause a different
    result than when using a sequence of fpc_AnsiStr_Concat() calls (it
    ignores empty strings to determine the result code page) and it's also

  * do not consider the run time code page of the destination string in
    fpc_AnsiStr_Concat(_multi)() because Delphi does not do so either. This
    was introduced in r19118, probably to hide another bug + test
  * never change the code page of a non-empty string when calling setlength on

  * handle the fact that GetEnvironmentStringsA returns the environment in the
    OEM instead of in the Ansi code page (mantis #22524, #15233)
  * don't truncate environment variable strings in GetEnvironmentString(),
    its result is now ansistring/unicodestring depending on whether the
    RTL was compiled with FPC_RTL_UNICODE

  * unix:
   o made the ansistring parameters of the fp*() file system routine overloads
     constant, changed them to rawbytestring and added
     DefaultFileSystemCodePage conversions
   o unicodestring support for POpen(), and DefaultFileSystemCodePage support
     for POpen(RawByteString)

  + DefaultFileSystemCodePage support for dynlibs unit

  + rawbytestring/unicodestring overloads for:
   o system: fexpand, lowercase, uppercase, getdir, mkdir, chdir, rmdir,
     assign, erase, rename
   o objpas: AssignFile, 
   o sysutils: FileCreate, FileOpen, FileExists, DirectoryExists, FileSetDate,
     FileGetAttr, FileSetAttr, DeleteFile, RenameFile, FileSearch, ExeSearch,
     FindFirst, FindNext, FindClose, FileIsReadOnly, GetCurrentDir,
     SetCurrentDir, ChangeFileExt, ExtractFilePath, ExtractFileDrive,
     ExtractFileName, ExtractFileExt, ExtractFileDir, ExtractShortPathName,
     ExpandFileName, ExpandFileNameCase, ExpandUNCFileName,
     ExtractRelativepath, IncludeTrailingPathDelimiter,
     IncludeTrailingBackslash, ExcludeTrailingBackslash,
     ExcludeTrailingPathDelimiter, IncludeLeadingPathDelimiter,
     ExcludeLeadingPathDelimiter, IsPathDelimiter, DoDirSeparators,
     SetDirSeparators, GetDirs, ConcatPaths, GetEnvironmentVariable

    -- the default string type used by FindFirst/Next depends on whether the
      RTL was compiled with FPC_RTL_UNICODE. To force the RawByteString
      version pass a TRawByteSearchRec, for the UnicodeString version pass
      a TUnicodeSearchRec.

  + paramstr(longint):unicodestring available for {$modeswitch unicodestrings}

  + pwidechar versions in sysutils of strecopy, strend, strcat, strcomp,
    strlcomp, stricomp, strlcat, strrscan,strlower, strupper, strlicomp,
    strpos, WideStrAlloc, StrBufSize, StrDispose + tests

More information about the fpc-devel mailing list