[fpc-devel] Unicode and UTF8String
Martin Friebe
fpc at mfriebe.de
Mon Dec 1 14:28:21 CET 2008
Marco van de Voort wrote:
> In our previous episode, Martin Friebe said:
>
>>> Why would you do this and not
>>> MyString := SomeRTLRoutine;
>>> ?
>>>
>> If I understand that right, this may cause some overhead, that in
>> some(few) cases is not needed.
>>
> Correct.
>
>> If I write an application using stringtype "X" (WideString for
>> example), then in the above "MyString" would be WideString.
>>
> Correct
>> The in/ouput for SomeRTLRoutine are RtlString, they are OS depended. If
>> I compile for a OS using "UTF8" then that means for each and every call,
>> it needs a string conversation.
>>
> Correct.
>
>> Of course I understand, *if* some RTLFunction calls the OS, then the
>> string must be converted. But if I simply want to extract the drive
>> letter, or trim the path, and get the file name, without actually
>> accessing the file or OS? Should it be possible to skip converting?
>>
>
> Use rtlstring. Do the conversion to widestring after.
>
> IOW, you should do it the other way around. Use the OS dependant stringtype
> for mostly encoding independant operations, and only the few things where
> you need specific encodings force a certain encoding (using utf8string or
> widestring)
>
>
I agree, using RTlString will probably help fpc to optimize your exe for
each OS.
But, using RTLString means you do not know, if you have UTF8 or not.
Because UTF8 behaves slightly different from other Strings, many
operations can not be performed on RTLString
foo[1], copy, pos ... simply because you do not know, if the result is a
char, a codepoint or a subcodepoint (single utf8 byte)
RTLString is or will be great, if you simply need to store an OS
depended string in order to later give it back to the OS. (eg open file,
remember file name, but do not process it (displaying it would be vi
OS), and save file back to the same name.)
For this you could also use ByteString: if there is such a thing, and if
it behaves as "not converting, if assigned to any string"
Best Regards
Martin
---
Disclaimer: Just to keep this discussion where it was:
- I do understand why the above is as it is (string index not being utf8
chart access).
- I do not believe that this is correct too (and any discussion should
be a new thread)
More information about the fpc-devel
mailing list