[fpc-devel] Unicode and UTF8String

Jonas Maebe jonas.maebe at elis.ugent.be
Mon Dec 1 21:09:52 CET 2008


On 01 Dec 2008, at 20:57, Felipe Monteiro de Carvalho wrote:

> On Mon, Dec 1, 2008 at 5:40 PM, Florian Klaempfl <florian at freepascal.org 
> > wrote:
>>> What string type will be TStrings.Items and the many other strings  
>>> in
>>> the classes.pp?
>>
>> Not yet decided though I'd make them RTLString as well.
>
> I think you can't change TStrings because that would break all code
> using it (huges amount of code).
>
> I would recommend adding a similar class with a different name.

In that case, I would recommend giving it the "string with attached  
encoding" style type so you don't need 5 tstrings variants.

Regarding how to deal with file system representations, conversions  
etc, it may also be interesting to look at Apple's NSString class (http://developer.apple.com/documentation/Cocoa/Reference/Foundation/Classes/NSString_Class/Reference/NSString.html 
) or, if you prefer a procedural approach, CFStrings (http://developer.apple.com/documentation/CoreFoundation/Reference/CFStringRef/index.html 
)

I'm not suggesting to mimik that exact API, but only to see what kind  
of APIs they support (and are deprecating). NSString/CFString (one is  
just an OOP version of the other) are also a "universal string  
container" type, with embedded encoding.

For example, there are routines such as
* CFStringGetCharacterAtIndex() (and more optimised approaches as  
documented there, such as CFStringGetRangeOfComposedCharactersAtIndex())
* CFStringGetFileSystemRepresentation() (basically the "rtlstring"  
version of the string)
* CFStringConvertWindowsCodepageToEncoding() ("Returns the Core  
Foundation encoding constant that is the closest mapping to a given  
Windows codepage identifier.")
* ...

The advantage when using such a type is that you also only need to  
convert it (internally, hidden from the user) on demand or when some  
helper routine requires it (such as e.g. case-insensitive  
comparisons). Otherwise, no conversion whatsoever is necessary.


Jonas



More information about the fpc-devel mailing list