[fpc-devel] cpstrrtl/unicode branch merged to trunk
Hans-Peter Diettrich
DrDiettrich1 at aol.com
Fri Sep 6 13:54:06 CEST 2013
Jonas Maebe schrieb:
> Hi,
>
> I've just merged the cpstrrtl/unicode branch into trunk. Below you can find the commit message, which describes most changes, the added features and also a very important warning.
>
>
> Jonas
>
> o merged cpstrrtl branch (includes unicode branch). In general, this adds
> support for arbitrarily encoded ansistrings to many routines related to
> file system access (and some others).
>
> WARNING: while the parameters of many routines have been changed from
> "ansistring" to "rawbytestring" to avoid data loss due to conversions,
> this is not a panacea. If you pass a string concatenation to such a
> parameter and not all strings in this concatenation have the same
> code page, all strings and the result will be converted to
> DefaultSystemCodePage (= ansi code page by default).
That conversion IMO is done by the every concatenation, apart from
subroutine considerations.
> In particular,
> concatenating e.g. an Utf8String with a constant string and passing
> the result to a RawByteString parameter will convert the result into
> the DefaultSystemCodePage (unless the source code is compiler with
> {$modeswitch systemcodepage} or {$mode delphiunicode} *and* the ansi
> code page on the system you are compiling *on* happens to be UTF-8)
>
> You can define and use alternative routines that explicitly accept
> Utf8String parameters to avoid this pitfall. Internally, all of these
> routines ensure that they never trigger this condition and ensure that
> not unnecessary/unwanted code page conversions occur.
Delphi has overloaded functions for RawByteString and AnsiString(0). FPC
could add another Utf8String overload.
I'm not sure how efficient a RawByteString version ever can be. By
default it has to convert the string into Unicode (Delphi: UTF-16), and
the result back to CP_ACP. In these cases it looks more efficient to
call the Unicode version immediately, and leave *eventual* further
conversions to the compiler. Some routines may implement common
processing of true SBCS, but I'm not sure how many these are.
> + SetMultiByteFileSystemCodePage() procedure to override the value of
> DefaultFileSystemCodePage
> + ToSingleByteFileSystemEncodedFileName() function to convert a string to to
> DefaultFileSystemCodePage (does *not* take care of OS-specific quirks like
> Darwin always returning file names in decomposed UTF-8)
> + support for CP_OEMCP
> * textrec/filerec now store the filename by default using widechar. It is
> possible to switch back to ansichars using the FPC_ANSI_TEXTFILEREC define.
> In that case, from now on the filename will always be stored in
> DefaultFileSystemEncoding
Does there exist a FileSystemString type, for easy use in RTL and
application code?
DoDi
More information about the fpc-devel
mailing list