[fpc-devel] Unicode in the RTL (my ideas)

Hans-Peter Diettrich DrDiettrich1 at aol.com
Tue Aug 21 14:13:32 CEST 2012


Graeme Geldenhuys schrieb:
> On 20 August 2012 23:18, Hans-Peter Diettrich <DrDiettrich1 at aol.com> wrote:
>> The Delphi developers wanted to implement what you suggest, but dropped that
>> approach later again.
> 
> When Embarcadero implemented Unicode support, Delphi was a pure
> Windows application. They had no need to think of anything other than
> what Windows supports.

So what? The poor performance of an variable char-size string type is 
not related to any platform.


>> A character type is somewhat useless, unless all strings are UTF-32 (what's
>> quite unlikely now). Instead substrings should be used, which can contain
>> any number of bytes or characters.
> 
> I guess that depends on how you define the Char type. Is it meant to
> hold a single Unicode codepoint, or a single printable character. If
> the latter, then probably a bigger Char type is required.

A string can contain any number of characters, including zero. Why make 
a distinction between handling a single character from handling multiple 
characters? An UTF-32 Char type will require implicit conversion into an 
string, before it can be used with strings of any other encoding. Not 
very efficient, indeed :-(


>> You also have to explain what String[4] means in an Unicode environment.
> 
> The String[] syntax in Object Pascal means you are defining a
> shortstring type (irrespective of compiler mode), thus an array of
> bytes. In this case 4-bytes are used to hold any Unicode codepoint.

Why abuse an ShortString type, when any ordinal 4-byte value will do the 
same? Did you consider that ShortStrings deserve special handling, WRT 
e.g. their Length field? The 5 bytes in memory also don't fit nicely 
into an aligned memory layout, and the compiler may insert range 
checking and other useless code. When ordinary ShortStrings have their 
own fixed encoding (CP_ACP?), you'll have to tell the compiler to ignore 
all that when dealing with your Char=String[4] type :-(

>> Q: Did you ever read about the new string implementation of FPC?
> 
> I have read some of the message threads that went around in fpc-devel,
> I also worked on the cp branch before it was merged with Trunk. If you
> have any other "documentation" in mind, please post the URL and I'll
> happily take a look.

Then read it again, you seem to have missed essential points.

DoDi




More information about the fpc-devel mailing list