[fpc-devel] ccharset.pas, charset.pas and strings/unicode ?

Skybuck Flying skybuck2000 at hotmail.com
Thu Apr 7 05:51:01 CEST 2011


Hmm ok, so here is a little theoretical/hypothetical question for you to 
think and guess about ;):

Suppose some kind of weird dissaster happens, like tsunami in japan... all 
our computers are destroyed...

What remains are the free pascal source codes.

What remains is a object pascal compiler which works with unicode strings 
only.

Now suppose string is defined as a unicode string.

This would lead to some problems... but ok... if the compiler supports 
shortstring then that's easily solved...

But my question is a little bit the following:

What would happen if the compiler was unicode only ?

Could the compiler still be build ? I would guess so... unless it depends on 
some ansi strings in assembly or so...

Furthermore what happens to statements/code like this:

SomeString := 'SomeText';


I think in a unicode compiler 'SomeText' might actually be defaulted to 
unicode ?

So then perhaps in compiler it's necessary to typecast this explicitly to:

SomeString := AnsiString('SomeText');

or perhaps even

SomeString := ShortString('SomeText');

I am not sure if these typecasts are needed or if there is a better way...

One way would be:

ShortString := 'SomeText';

But then the assumption would be that the compiler turns the string into 
whatever ShortString is...


But then the question is what would the following do:


if SomeString = 'SomeText' then

???

Would 'SomeText' be used to have the same type as SomeString ?

Would automatic conversion take place ?

or

Would a string type violation occur if the SomeString was of another type 
then the default of 'SomeText'... ?

So that's pretty nasty...

At the moment I have little idea what Delphi XE does... (little experience 
with unicode)

But I would guess everything defaults to unicode ?!? I could be wrong 
though...
(At least that's what it seems to be doing ;))

That does not necessarily mean I agree with how things are done in Delphi XE 
but such is life ;)



Anyway what remains to be discussed is advantages of a unicode compiler...


One thing comes to mind: "chinese people and greek people might be able to 
develop a compiler in their own language..."


Also what remains is disadventages of unicode compiler...


You already mentioned possible performance issue's... though is there really 
that much difference between shortstring and widestring and ansistring...
it's more or less the same except one has a reference count and another has 
double the ammount of characters...

But a bigger disadventage which I can imagine is operating systems... 
perhaps older ones which do not support unicode ?!?

What would happen to them ?!? Big string corruption me thinks ;) But I could 
be wrong ;)

Maybe even free pascal dos applications could still somehow use unicode if 
the compiler took care of all of it ?

At least internally in the application it would then work... same could be 
done for win95...

Only communication with api's in win95 or interrupts in dos would probably 
screwed up... relating to dos those pretty little might be re-written but 
ok.
Must draw the line somewhere... perhaps even unicode-fonts could be included 
;) pff ;) (but that's probably pushing it ! ;) =D) Nice to think of 
possibilities though... I like backwards compatibility quite a lot ;)


Bye,
  Skybuck =D


----- Original Message ----- 
From: "Sven Barth" <pascaldragon at googlemail.com>
To: <fpc-devel at lists.freepascal.org>
Sent: Wednesday, 6 April, 2011 14:40 PM
Subject: Re: [fpc-devel] ccharset.pas, charset.pas and strings/unicode ?


> Am 06.04.2011 08:30, schrieb Skybuck Flying:
>> Hello,
>>
>> I am having momentarily confusion about the situation with ccharset.pas
>> and charset.pas and strings, ansistrings and unicode in general... ?!?
>>
>> So some questions about this:
>>
>> I in particularly do not understand the following uses clausule:
>>
>> {$ifdef VER2_2}ccharset{$else VER2_2}charset{$endif VER2_2},
>>
>> Somewhere it says something about bootstrapping and stuff like that...
>> it seems to have something to do with unicode mappings...
>>
>> It also said that this wasn't necessary anymore beyond version 2.2.2 or
>> something ?
>>
>
> Something like this is normally done when code is added to the RTL (in 
> this case the unit "charset") which is used by the compiler as well. As 
> the compiler must be built with an older compiler (and its older RTL) 
> first, that compiler does not yet know about the "charset" unit. Thatfor 
> the unit is copied to the compiler's directory with a "c" prefix (in this 
> case "ccharset") until a release is made which contains that new unit. The 
> unit you are looking for is in rtl/inc now, so that ifdef-construct (and 
> the ccharset unit) could be removed now.
>
> Something similar was done a few days ago with the new "windirs" unit 
> which was added as "cwindirs" to the compiler as well.
>
>> This seems to me like a little unicode-hack to get unicode into the
>> compiler or something ?
>>
>> What the hell is this ? =D
>>
>> Anyway some questions about the free pascal 2.4.2 sources in relation to
>> Delphi XE situation:
>>
>> In the latest Delphi versions "string" is now considered a Unicode 
>> string.
>>
>> What's the situation with the "options.pas" in the compiler folder ?
>>
>> Lot's of string stuff and character stuff going on there... ansistring
>> versus unicodestring, ansichar versus unicodechar ?
>>
>
> Options.pas has nothing to do with different string types. It's for 
> parsing the command line arguments and the configuration file and for 
> setting up the start defines based on that arguments and files. Mostly you 
> don't need to touch options.pas at all.
>
>> Seems a bit conflicting for what I am trying to do... which is use some
>> of this code in Delphi...
>>
>> So I am getting all kinds of typecast/implicit string cast warnings and
>> errors and stuff and potential data loss
>> from "string" to "ansistring"... a bit too whacky for my taste but ok...
>>
>> So to get some sense into all of this let me ask you a simple question:
>>
>> 1. What type of strings does free pascal use ? Especially in options.pas 
>> ?
>>
>> Are these "string" types considered to be AnsiStrings or UnicodeStrings 
>> ???
>>
>> And what about "char" types ? Are those AnsiChar or UnicodeChar ???
>>
>> (probably also know as widechar,widestrong...)
>>
>
> The compiler itself mostly uses ShortString and pointers to ShortString as 
> they don't have the reference counting and thus are faster to handle. In 
> some seldom cases AnsiString (aka String) is used and WideString is - as 
> far as I'm aware of - never used.
>
> The supported string types by FPC though are ShortString, AnsiString, 
> WideString (non reference counted 2 Byte String for Windows compatibilty) 
> and UnicodeString (reference counted 2 Byte String). On all platforms 
> except Windows (Win32, Win64, WinCE) a WideString is an alias for 
> UnicodeString.
> In mode Delphi "String" is an alias for "AnsiString" in all other modes 
> (unless $H+ is given) "String" is an alias for "ShortString".
>
>> (I have in principle done no real programming yet with the newer Delphi
>> versions with the unicode stuff in it...
>> so this is new stuff for me... and now a bit confusion unfortunately...
>> and perhaps even unavoidable confusion...
>> because this "reinterpretation" that "new-borland" did is now
>> conflicting and causing interpretation issue's :(
>> so it depends on the compiler... and I don't know what free pascal
>> does... so that's why I ask here...)
>>
>> Also there is something I don't understand about the conditional way 
>> above:
>>
>> It reads in away:
>>
>> IF VERSION IS 2.2 THEN USE CCHARSET ELSE CHARSET
>>
>> The thing is: I am using 2.4.2 and CHARSET is missing from 2.4.2
>
> This condition is the correct one. CCharSet should be removed maybe as all 
> compilers from 2.4.0 on use CharSet from the RTL directory.
>
>>
>> So perhaps this conditional was ment to read something like:
>>
>> if Version > 2.2 then use CCHARSET else CHARSET; ???
>>
>> So for 2.4.2 I must probably use CCHARSET.pas the thing with the
>> confusing strings remains though ;)
>>
>> So for messy posting... but this is messy ! ;) =D
>
> No, it's not ;)
>
> Regards,
> Sven
> _______________________________________________
> fpc-devel maillist  -  fpc-devel at lists.freepascal.org
> http://lists.freepascal.org/mailman/listinfo/fpc-devel
> 




More information about the fpc-devel mailing list