[fpc-pascal] Generic String Functions

Michael Schnell mschnell at lumino.de
Fri Feb 28 15:01:48 CET 2014


On 02/28/2014 01:04 PM, Marco van de Voort wrote:
> Moreover, will operations that use character access make sense at all 
> if you don't know what the actual encoding is? 
The administrative record of each "New Delphi string" contains the 
encoding type and the byte-count for each code. So "you" (the compiler 
and the RTL) do know it.

The "only" shortcoming in Delphi is that the handling is completely 
"static":
  - if the encoding definition of the type the string is created with is 
not "RAW", the encoding needs to be known at compile time (i.e. the 
encoding type is not allowed to be modified at run time)
  - if the encoding definition of the type the string is created with is 
"RAW", auto-conversion from this string to a non-RAW is not done.

Hence (including - but not only - for decent use on multiple OSes) an 
additional "fully dynamically encoded" type (I suggest to call the type 
of this Strings "Generic") is necessary.

> (not only s[] but also
> pos,delete,insert etc).   The same code can seem to behave differently
> because different code-paths make the same parameter have different
> encodings.
I suppose that you are right. But not only the "funny" position numbers  
pos(), delete(), insert() and friends use, create a problem, but also 
the the String type they are defined to use does:

  - If using any statically encoded type for same, it is close to 
impossible to create decently fast programs for string manipulation 
(unless they by chance use the correct encoding type), as 
auto-conversion to and fro is invisibly introduced.

  - If using the suggested dynamically encoded type, we will have 
problems when combining strings of different types in a code snippet 
that calls these functions.

I don't know if / how / to_what_extent compiler magic can help here 
(doing auto-conversion "when necessary" similar to when simply assigning 
strings of different encoding types).

In the end, I feel it would be very un-desirable but might be the only 
"easy" solution to go with full Delphi compatibility and handle all 
strings encoding but UFT16 in a very un-decent way. This would force 
Lazarus to provide a (Delphi compatible) LCL-API completely done with 
UTF16. This completely contradicts all they did in the last few years :-) .

-Michael



More information about the fpc-pascal mailing list