[fpc-devel] cpstrrtl/unicode branch merged to trunk

Michael Schnell mschnell at lumino.de
Mon Sep 9 14:25:17 CEST 2013


On 09/07/2013 03:00 PM, Sven Barth wrote:
>
> We do NOT want to force UnicodeString upon every target. The world not 
> only consists of Windows!

+1 !

Of course a compiler switch to not use the "NewStrings" would be 
appropriate.

OTOH IMHO it should be possible to in fact use the "NewStrings" in Linux 
with a default encoding of UTF8.

Thus, a decently Delphi compatible definition of the encoding when 
defining Strings (not using the aliases provide) could be:

"($0000)" Default encoding (e.g. UTF16 when compiled for Windows and 
UTF8 when compiled for Liunx. The RTL OS-centric functions, and in 
Lazarus the LCL, internally would avoid many conversions when accessed 
with user code using the default encoding either by "($0000)" or 
"appropriately" defined strings. Identical to ("$mmmm") with $mmmm being 
the default encoding when compiling
"($nnnn)" Delphi compatible, auto converting
"($FFFF)" Delphi compatible raw byte string, not auto converting
"($FFFF, 1)" Not Delphi compatible: identical to "($FFFF)" (after a ",", 
the element size is defined; without a "," the element size is set 
according to the character code)
"($FFFF, 2)" Not Delphi compatible: raw Word string, not auto converting
"($FFFF, 4)" Not Delphi compatible: raw DWord string, not auto converting
"($FFFF, 8)" Not Delphi compatible: raw QWord string, not auto converting
"("$FFFE)" Not Delphi compatible: dynamically encoded String, auto 
converting when necessary.

The codes "($0000)" and "($FFFE)" are never stored within the string 
header nor are they known to the Library functions. They only trigger 
the appropriate compiler magic. The String headers always contain the 
actual encoding type which is fixed for "($0000)"-predefined Strings and 
dynamic for "($FFFE)"-predefined strings.

-Michael



More information about the fpc-devel mailing list