[fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

Michael Schnell mschnell at lumino.de
Tue Dec 2 13:05:28 CET 2014


On 11/28/2014 09:15 PM, Hans-Peter Diettrich wrote:
>
> You suggested to use "string" as UTF-16 on Windows, and UTF-8 on 
> Linux. That's what I understand as a unique program-wide string 
> representation (not sourcecode-wide, instead program as *compiled*). 
> Then I cannot see any need or use for another DynamicString type.
I already did understand your meaning and I understand that this " 
unique program-wide string representation" is better than having the 
libraries' APIs (including TStrings) force a fixed string encoding 
brand, independently from the OS we compile for (and selectable $mode 
specifications). But I  don't *suggest* this way, as it is not very 
versatile and hampers portability. As said I *suggest* using 
DynamicString in such cases. Nonetheless, the types simply called 
"String" might be done in the way you suggest.

> Nothing can be broken, as long as the Delphi behaviour is undefined. 
That of course is is correct, but just follows the poor excuse 
Embarcadero  offers for the flawed implementation of RawByteString 
(which as we both agree will never be fixed). (In fact there are many 
instances that old flaws have been deliberately reproduces for not 
breaking compatibly.)

> Applied to FPC/Lazarus code (compiler, libraries, IDE...) this means 
> that it's obviously easier to *prevent* possibly different 
> static/dynamic encodings, instead of *checking and reacting* on such 
> flaws throughout the entire codebase. 
OK. Kill the Type RawByteString and the constant CP_NONE and the 
usability of it's value $FFFF. I do vote for doing so and instead 
provide new types such as ByteString, WordString, DWordString, and 
QWordString denoted by the constants CP_Byte = $FF01, CP_Word = $FF02, 
CP_DWord = $FF04, CP_QWord = $FF08.

> Apart from that, every encoding-tolerant code will execute much slower 
> than code without a need for checks and conversions everywhere.
As I pointed out I don't agree at all.
  - The check is only two ASM instructions
  - It does not result in additional conversions. In fact in appropriate 
cases it can avoid a huge count of conversations (especially when 
calling libraries, e.g. by means of TStrings)
  - in pure user code, the check is only done if DynamicString really is 
used in the user code, hence only when the user knows what to do. In 
fact commonly degradation = 0%
  - When calling libraries (e.g. via TStrings), the  check is very small 
regarding that a function call is done as a result of the same 
statement. Estimated commonly degradation = 0,000001 %

So the "Checking Overhead" is nothing but a rumor. (Remember, I don't 
suggest dropping the standard "statically typed" paradigm, altogether, 
as close loops of course work best in that way.

>> That is why fpc would need to define an additional type name (e.g 
>> "DynamicString") and encoding brand number (e.g. "CP_ANY" = $FF00) 
>> for a decently usable type for intermediately holding a  String content.
>
> This again would make *FPC* programs incompatible with Delphi. 
As I decently explained this would not brake any backwards 
compatibility, even if TStrings uses this type.
  - The new type is just additional, so its pure existence can't break 
anything: you don't need to use it in user-code, if you don't want to.
  - The use of DynamicString in the interface of Library functions does 
not break anything, as it is (to be) constructed in a way that provides 
full compatibility.

Please do show any code (not containing RawByteString) that is not 
compatible when using the DynamicString paradigm as described in 
http://wiki.freepascal.org/not_Delphi_compatible_enhancement_for_Unicode_Support#Analysis 
. Maybe the page needs to be improved.

> While fixing the RawByteString flaw would at least allow to *compile* 
> FPC code with Delphi, the use of an different encoding value would 
> definitely prevent compilation of such code with Delphi. What's the 
> more serious incompatibility?
IMHO this would be much more dangerous than introducing a decently 
working new DynamicString type.
>> RawXxxString can be used for really "uncoded" data as done with 
>> old-style strings in a lot of applications.
>
> Such a feature would be appreciated by many users, indeed :-)

While I would happily follow you suggesting making "indecent" use of 
this type impossible ia the fpc compiler, I don't think it's very 
dangerous to re-introduce the abysmal Delphi compatible behavior of 
RawByteString (may as well the documented as the the undocumented 
"features").

But why do you say "would be appreciated" ? Is it not possible to use 
"RawByteString" in a way the name suggests, by never bringing it 
together with any String variable of a different encoding brand and 
hence avoid any conversion - be same intentional/documented/useful or not.


Anyway: I added a sentence in the introduction of the wiki page, 
explaining the paradigm a little more explicitly.



-Michael







More information about the fpc-devel mailing list