[fpc-devel] Unicode and UTF8String

Martin Friebe fpc at mfriebe.de
Mon Dec 1 14:28:21 CET 2008


Marco van de Voort wrote:
> In our previous episode, Martin Friebe said:
>   
>>> Why would you do this and not
>>> MyString := SomeRTLRoutine;
>>> ?  
>>>       
>> If I understand that right, this may cause some overhead, that in 
>> some(few) cases is not needed.
>>     
> Correct.  
>   
>> If I write an application using  stringtype "X" (WideString for 
>> example), then in the above "MyString" would be WideString.
>>     
> Correct
>> The in/ouput for SomeRTLRoutine are RtlString, they are OS depended. If 
>> I compile for a OS using "UTF8" then that means for each and every call, 
>> it needs a string conversation.
>>     
> Correct.
>   
>> Of course I understand, *if* some RTLFunction calls the OS, then the 
>> string must be converted. But if I simply want to extract the drive 
>> letter, or trim the path, and get the file name, without actually 
>> accessing the file or OS? Should it be possible to skip converting?
>>     
>
> Use rtlstring. Do the conversion to widestring after.
>
> IOW, you should do it the other way around. Use the OS dependant stringtype
> for mostly encoding independant operations, and only the few things where
> you need specific encodings force a certain encoding (using utf8string or
> widestring)
>
>   
I agree, using RTlString will probably help fpc to optimize your exe for 
each OS.

But, using RTLString means you do not know, if you have UTF8 or not. 
Because UTF8 behaves slightly different from other Strings, many 
operations can not be performed on RTLString

foo[1], copy, pos ... simply because you do not know, if the result is a 
char, a codepoint or a subcodepoint (single utf8 byte)

RTLString is or will be great, if you simply need to store an OS 
depended string in order to later give it back to the OS. (eg open file, 
remember file name, but do not process it (displaying it would be vi 
OS), and save file back to the same name.)

For this you could also use ByteString: if there is such a thing, and if 
it behaves as "not converting, if assigned to any string"


Best Regards
Martin


---
Disclaimer: Just to keep this discussion where it was:
- I do understand why the above is as it is (string index not being utf8 
chart access).
- I do not believe that this is correct too (and any discussion should 
be a new thread)




More information about the fpc-devel mailing list