[fpc-devel] Controlling the output of the widestring manager

Marc Weustink marc at dommelstein.net
Sun Dec 2 01:04:21 CET 2007


Florian Klaempfl wrote:
> Felipe Monteiro de Carvalho schrieb:
>> Hello,
>>
>> Today I took a look at what would be necessary to implement
>> controlling the output of the widestring manager, and implementing
>> this looks rather trivial.
>>
>> The only thing we need is a switch, to control if widestrings should
>> be converted to the current locale or to utf-8. In the case of current
>> locale one can use the current widestring function Wide2AnsiMoveProc .
>> In case of utf-8 we can use the cross-platform UnicodeToUTF8 function.
>>
>> I would start by adding a new field to TWideStringManager:
>>
>> OutputUTF8ToStrings: Boolean;
> 
> As discussed on irc today, this is a no go for the rtl. Ansistrings are
> encoded in the system encoding, this is a convention everything depends
> on like all Ansi* routines in sysutils.  I give the bug report "I added
> widestringmanager.outputformat := soUTF8; to my program, now
> AnsiStrUpper doesn't work anymore", 1-2 weeks. It is fine if projects
> break with the convention, but then they have to deal with it by e.g. by
> overriding ansi<->wide conversion in the widestring manager.
> 
> The full discussion can be found here:
> <http://www.hu.freepascal.org/fpcircbot/cgifpcbot?channel=lazarus-ide&fromdate=12-1-07&todate=12-1-07&linecount=9999&fromtime=10%3A00&totime=12%3A45&sender=&msg=>

I agree with Florian that adding a global conversion controlling boolean 
is a hack. From day 1 when we (=Lazarus) started to use utf8 I wanted a 
separate string type for that (with the default answer that I always can 
add this myself).
Mixing the meaning of the contents of a string is hard to maintain. This 
is error prone and not really a "typesafe" pascal way.
 From the logs I understand that there is a "real" managed utf8string 
type (other than utf8string = type string).
If this is the case I am pro using this type as container for utf8 
encodings and leave ansi strings for system encodings as they were meant.
FOr portability reasons, it would be great if it is possible to control 
what type the string type defaults to. Similar to the {$H+} which 
switches between shortstring and ansistring I think something like 
{$STRINGTYPE SHORT|ANSI|UTF8}

Marc




More information about the fpc-devel mailing list