[fpc-devel] Unicode support in RTL - Roadmap

Fri Nov 21 19:19:33 CET 2008

Felipe Monteiro de Carvalho wrote:
> On Fri, Nov 21, 2008 at 2:42 PM, Michael Schnell <mschnell at lumino.de> wrote:
>   
>> And thus forces all users to "understand the full UTF-8 spec" and to rewrite
>> their programs, even though the old code perfectly compiles and up to a
>> certain extent seems to work.
>>
>> This is what I think is "not at all desirable" :( .
>>     
> Your comments are absolutely vague and meaningless. Not to mention
> thay also don't propose an alternative.
>
> Sorry to be blunt, but so were your comments

I must agree with the "FPC can not to it all automatically" line (as 
much as I regret, and admit the beauty there was, if fpc could).

What I mean is:

1) Any Application/Program, that currently compiles and works (using 
none utf8, never mind if ascii or ansi) will keep working, if compiled 
using *none* utf8 mode.

2) If such a program wants to be compiled to be extended to utf8 
support, then there is a need for decisions that can not be made without 
knowledge what the program is doing. Or even within the same program in 
which context the operation takes place.
Such knowledge is only available to the programmer of this application, 
therefore the application must be changed to include this decisions. FPC 
simple can not make them. (And even {$SWITCH} would not solve the issue.)

Example is the composed and decomposed "ü":

- If you edit a text (human readable text), or search in a text, you 
certainly do want to handle both representations as equals (a Find 
dialog must find both)
- If the same text editor saves the file, it must handle them as non 
equal.   Assume the user has 2 files "wünsche.txt" in the same folder. 
The filesystem allows this, because one of them is decomposed and one is 
composed.  If the user had opened a text from the composed version, it 
should be written back to the composed version. If the user had opened 
it from the decomposed version it must be written back to the decomposed 
version. Otherwise a completely unrelated file would simply be 
overwritten, and the contents lost. (the same applies if the application 
iterates through the directory content and compares file names. So here 
the same compare version that would be used by the "Find dialog" must 
behave different)

FPC can simply not know, if a string contains a file name, which must be 
kept exactly as it, or a string contains some human readable text, which 
would benefit from a "normalisation".

If you are going to put a compiler switch in front of each statement to 
indicate the needs, you may as well change the statements. There is no 
one statement for the whole application, as both of the above example 
occur within a single application.

You could use two different UTF8Strings which behave different on 
decomposed chars (I am *not* proposing this as a solution). But then you 
can not just recompile your app by saying "string" now means UTF8String 
throughout the whole application. You have again to  go through all of 
the source code and edit the app. So you may as well just go through the 
sourcecode, and add the appropriate utf8-clean up calls to those part in 
the code, that will need it.

In the end, switching an application to unicode means that within the 
same app different parts are going to need different handling of unicode 
(where no such difference existed for ascii/ansi). And no compiler can 
figure out which part will need which behaviour.