Summary on Re: [fpc-pascal] Unicode file routines proposal

Tue Jul 1 18:38:57 CEST 2008

Mattias Gärtner wrote:
> Zitat von Florian Klaempfl <florian at freepascal.org>:
> 
>> Michael Van Canneyt wrote:
>>> On Tue, 1 Jul 2008, Paul Ishenin wrote:
>>>
>>>> Michael Van Canneyt wrote:
>>>>> You can still do C:=S[i]. What you cannot do is
>>>>>
>>>>>   P:=PChar(S);
>>>>>   While (P^<>#0) do
>>>>>    SomeByteSizedOperation;
>>>>>
>>>> Why you cannot? PChar(S) should represent S as raw bytes. If you know what
>> you
>>>> are doing - it will not harm. In other case, if you corrupt the string
>> then
>>>> you are responsibile for all problems you get.
>>> Obviously you can :-)
>>> But what I meant was that you shouldn't expect old code
>>> that relied on 1-byte characters to work.
>> It is supposed to break on utf-xx or whatever anyways.
> 
> The above works normally for UTF-8. UTF-8 was designed for this. That's why most
> ansistring code works with UTF-8. Switching to UTF-8 was easy. Switching to
> UTF-16 needs more work.
> And a multi encoded string will break even more things. Means: more work. The
> question is: how much more?

What will break? As I said, the tflorianstring manager will get some
variables which allow to controll the behaviour of this string. For
example you could tell it that all strings should be utf-8 encoded. Of
course, you get into trouble if some user plays unfair but you could
still protect your code with some EnforceUTF8Encoding. It's exactly the
same as with the current lazarus solution. If the user messes with the
abused ansistrings, you're in trouble but with the tflorianstring you
have a runtime mean to detect the mess (wrong encoding).