[fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"
Michael Schnell
mschnell at lumino.de
Tue Dec 2 13:05:28 CET 2014
On 11/28/2014 09:15 PM, Hans-Peter Diettrich wrote:
>
> You suggested to use "string" as UTF-16 on Windows, and UTF-8 on
> Linux. That's what I understand as a unique program-wide string
> representation (not sourcecode-wide, instead program as *compiled*).
> Then I cannot see any need or use for another DynamicString type.
I already did understand your meaning and I understand that this "
unique program-wide string representation" is better than having the
libraries' APIs (including TStrings) force a fixed string encoding
brand, independently from the OS we compile for (and selectable $mode
specifications). But I don't *suggest* this way, as it is not very
versatile and hampers portability. As said I *suggest* using
DynamicString in such cases. Nonetheless, the types simply called
"String" might be done in the way you suggest.
> Nothing can be broken, as long as the Delphi behaviour is undefined.
That of course is is correct, but just follows the poor excuse
Embarcadero offers for the flawed implementation of RawByteString
(which as we both agree will never be fixed). (In fact there are many
instances that old flaws have been deliberately reproduces for not
breaking compatibly.)
> Applied to FPC/Lazarus code (compiler, libraries, IDE...) this means
> that it's obviously easier to *prevent* possibly different
> static/dynamic encodings, instead of *checking and reacting* on such
> flaws throughout the entire codebase.
OK. Kill the Type RawByteString and the constant CP_NONE and the
usability of it's value $FFFF. I do vote for doing so and instead
provide new types such as ByteString, WordString, DWordString, and
QWordString denoted by the constants CP_Byte = $FF01, CP_Word = $FF02,
CP_DWord = $FF04, CP_QWord = $FF08.
> Apart from that, every encoding-tolerant code will execute much slower
> than code without a need for checks and conversions everywhere.
As I pointed out I don't agree at all.
- The check is only two ASM instructions
- It does not result in additional conversions. In fact in appropriate
cases it can avoid a huge count of conversations (especially when
calling libraries, e.g. by means of TStrings)
- in pure user code, the check is only done if DynamicString really is
used in the user code, hence only when the user knows what to do. In
fact commonly degradation = 0%
- When calling libraries (e.g. via TStrings), the check is very small
regarding that a function call is done as a result of the same
statement. Estimated commonly degradation = 0,000001 %
So the "Checking Overhead" is nothing but a rumor. (Remember, I don't
suggest dropping the standard "statically typed" paradigm, altogether,
as close loops of course work best in that way.
>> That is why fpc would need to define an additional type name (e.g
>> "DynamicString") and encoding brand number (e.g. "CP_ANY" = $FF00)
>> for a decently usable type for intermediately holding a String content.
>
> This again would make *FPC* programs incompatible with Delphi.
As I decently explained this would not brake any backwards
compatibility, even if TStrings uses this type.
- The new type is just additional, so its pure existence can't break
anything: you don't need to use it in user-code, if you don't want to.
- The use of DynamicString in the interface of Library functions does
not break anything, as it is (to be) constructed in a way that provides
full compatibility.
Please do show any code (not containing RawByteString) that is not
compatible when using the DynamicString paradigm as described in
http://wiki.freepascal.org/not_Delphi_compatible_enhancement_for_Unicode_Support#Analysis
. Maybe the page needs to be improved.
> While fixing the RawByteString flaw would at least allow to *compile*
> FPC code with Delphi, the use of an different encoding value would
> definitely prevent compilation of such code with Delphi. What's the
> more serious incompatibility?
IMHO this would be much more dangerous than introducing a decently
working new DynamicString type.
>> RawXxxString can be used for really "uncoded" data as done with
>> old-style strings in a lot of applications.
>
> Such a feature would be appreciated by many users, indeed :-)
While I would happily follow you suggesting making "indecent" use of
this type impossible ia the fpc compiler, I don't think it's very
dangerous to re-introduce the abysmal Delphi compatible behavior of
RawByteString (may as well the documented as the the undocumented
"features").
But why do you say "would be appreciated" ? Is it not possible to use
"RawByteString" in a way the name suggests, by never bringing it
together with any String variable of a different encoding brand and
hence avoid any conversion - be same intentional/documented/useful or not.
Anyway: I added a sentence in the introduction of the wiki page,
explaining the paradigm a little more explicitly.
-Michael
More information about the fpc-devel
mailing list