[fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"
mschnell at lumino.de
Wed Dec 3 10:42:37 CET 2014
On 12/03/2014 05:02 AM, Hans-Peter Diettrich wrote:
> Michael Schnell schrieb:
>> - It does not result in additional conversions.
> It does, e.g. in searching or sorting of StringList, when it can contain
> strings of different encodings. The choice of a unique encoding for
> application strings (maybe CP_ACP, UTF-8 or UTF-16) eliminates such
If multiple encoding brands are involved, a system without DynamicString
also will need to do conversions. So DynamicString does not impose
>> So the "Checking Overhead" is nothing but a rumor. (Remember, I don't
>> suggest dropping the standard "statically typed" paradigm,
>> altogether, as close loops of course work best in that way.
> The rumor is the unimportant "Conversion Overhead", i.e. how often a
> check leads to a conversion. When no check is required, conversions
> consequently cannot ocur at all.
Please re-read the text I wrote.
- If in the user-code DynamicString is not used, the compiler creates
the same code as before. So no overhead.
- If DynamicString is used (in user-Code or in a Library interface),
but only a single encoding brand is used everywhere where statically
encoded strings are in place ("a single program-wide string
representation" as you suggested in you previous mail) the only runtime
overhead imposed is that at the locations where DynamicString is used
(i.e. not in any close loops) an additional check for the "EncodingType"
variable is implemented by the compiler. Here (unless the user actively
decides to create string variables with encoding brands other than the
program-wide default) at runtime the code *always* finds that no
conversion is necessary and acts as if the String would not be dynamic,
but already "correct". The overhead of checking is obviously at most
some 5 ASM instructions and hence unelectable regarding the function
call assigned to entering the library function in question.
> RawByteString cannot serve two different purposes :-( ....
As I pointed out as well: A variable' encoding brand can't be static and
dynamic at the same time. This is the cause of the major misconception
imposed by Delphi regarding RawByteString. And this is why I would leave
RawByteString aside (as it is / as it is assumed to be / whatever) and
for any improvement use a completely new Type name and a "CP_ANY"
constant / value.
> In *Delphi* it is used as a polymorphic string, capable of *holding*
> actual strings of any encoding. But when assigned to a variable of a
> different encoding, a conversion may occur that converts the string into
> the declared (static) encoding of the target variable.
Seemingly rather close to what I suggest as "DynamicString". But (see
) with a dynamic String the encoding brand number of such String would
not be allowed to ever be written into the EncodingType field in the
If this would be true, why do the Delhi Docs discourage making decent
use of the dynamic feature of RawByteString ?
Anyway. A "dynamic" String type only makes sense if it is used in as
many library interfaces (and TStrings). This is not done in Delphi and
in Delphi this is not nice, in many cases restricting the user to make
use of these libraries, but not as critical as with fpc, where you need
to consider portability issues.
> In *FPC* it currently is used somewhat close to your idea, i.e. no
> conversion occurs in both an assignment to *and from* an RawByteString
> to some other AnsiString.
As said, to avoid ambiguity, I vote for adding yet another string type
name (e.g. "ByteString" denoted by CP_BYTE) that is *known* to disallow
any conversion (and leave RawByteString as close as possible to the
moving target Delphi presents).
> I understand the FPC attempt, to allow *at the same time* for the new
> (encoded) and old (unencoded) AnsiString behaviour, where no automatic
> conversions are allowed. But this would require at the same time, that
> e.g. all string literals *also* are stored in that (immutable) encoding,
> and that this encoding can *not* be changed at runtime, while
> DefaultSystemCodePage *can* be changed.
I feel that this (simplified) attempt can't result in a decent paradigm.
It is close to impossible to completely describe the behavior in an
understandable way and it's prone to a lot of ambiguity.
That is why I tried to invent a concept that I suppose might work and
will not break (much) existing code. It is intended to be "straight"
from ground up (it is not even necessary to assume that the content of a
"String" is printable/readable, but it should easily work for that
application.) It would allow for making flexible use of Strings with
understandable and easy to use syntax candy, and would not impose
restrictions to portability any more. IMHO it would not impose
(noticeable) performance degradation, either.
More information about the fpc-devel