[fpc-devel] Unicode resource strings

Mon Aug 20 19:28:32 CEST 2012

Graeme Geldenhuys schrieb:
> On 20/08/12 08:52, Sven Barth wrote:
>>
>> Just to avoid confusion: The reference counted 2-byte string type on all
>> platforms is UnicodeString, not WideString (the latter is not reference
>> counted on Windows platforms).
> 
> Please correct me if I am wrong, but I think WideString was reference 
> counted an all platforms "in the beginning" - like Martin mentioned. 
> Later it was changed, and the new UnicodeString become the "reference 
> counted on all platforms" type.

WideStrings on Windows platforms are allocated in *system* space, so 
that they can be used across processes. Reference counting can occur 
only according to the Windows (COM) rules. Delphi UnicodeStrings are 
stored in the (local) program space instead, so that local reference 
counting can be used. Dunno about passing such strings to other 
processes, though.

>> The codepage aware string type was added to 2.7.1, because there already
>> existed a branch for this and "just" needed to be merged. There does not
>> yet exist any code for Unicode resource strings.
> 
> FPC's Unicode support is still in its infancy. It is not just resource 
> strings that are missing. As my recent message from the fpc-users 
> mailing list shows.
> 
> Vital decisions of how Unicode should be implemented are still not 
> decided by the FPC team. There is a major problem in the FPC project 
> though. The FPC team seems to be dead-locked on how to implement Unicode 
> features. Nobody can agree on anything. Thus no work can be started on 
> the RTL and FCL.
> 
> In the meant time many projects keep implementing there own Unicode 
> workarounds. Not a good sign, but all we can do.

IMO UTF-8 is supported by all platforms, so that there exists no urgent 
need for adding UTF-16 support. More problematic is the default "String" 
type break between older (AnsiString) and newer (UnicodeString) Delphi 
versions. The consequence of following *that* decision were incompatible 
FCL (and LCL) classes, resulting in double maintenance efforts. This 
duplication can be avoided by using the implicit string conversions, 
offered by the new string types. This applies also to the handling of 
resource strings. The runtime impact depends on the string model used in 
a *program*, where the use of UTF-16 strings would require many 
conversions in *interfacing* UTF-8 components/libraries.

It's unclear whether UTF-16 strings really allow for faster string 
handling, since *full* Unicode support still has to take into account 
UTF-16 *surrogate pairs*, no real difference vs. handling of UTF-8 
multibyte sequences.

> So the BIG question remains: When will the FPC team sit down and hash 
> out the details of implementing Unicode support? Please note, I'm not 
> saying "implement it", just saying... "agree on how it should be 
> implemented". If the FPC team stays in a dead-lock, then maybe the 
> better option would be to allow the public to vote on it.

What special support do you expect?
Which of these features are essentially different for UTF-8 and UTF-16?

DoDi