[fpc-devel] Unicode support in RTL - Roadmap
Martin Friebe
fpc at mfriebe.de
Fri Nov 21 19:19:33 CET 2008
Felipe Monteiro de Carvalho wrote:
> On Fri, Nov 21, 2008 at 2:42 PM, Michael Schnell <mschnell at lumino.de> wrote:
>
>> And thus forces all users to "understand the full UTF-8 spec" and to rewrite
>> their programs, even though the old code perfectly compiles and up to a
>> certain extent seems to work.
>>
>> This is what I think is "not at all desirable" :( .
>>
> Your comments are absolutely vague and meaningless. Not to mention
> thay also don't propose an alternative.
>
> Sorry to be blunt, but so were your comments
I must agree with the "FPC can not to it all automatically" line (as
much as I regret, and admit the beauty there was, if fpc could).
What I mean is:
1) Any Application/Program, that currently compiles and works (using
none utf8, never mind if ascii or ansi) will keep working, if compiled
using *none* utf8 mode.
2) If such a program wants to be compiled to be extended to utf8
support, then there is a need for decisions that can not be made without
knowledge what the program is doing. Or even within the same program in
which context the operation takes place.
Such knowledge is only available to the programmer of this application,
therefore the application must be changed to include this decisions. FPC
simple can not make them. (And even {$SWITCH} would not solve the issue.)
Example is the composed and decomposed "ü":
- If you edit a text (human readable text), or search in a text, you
certainly do want to handle both representations as equals (a Find
dialog must find both)
- If the same text editor saves the file, it must handle them as non
equal. Assume the user has 2 files "wünsche.txt" in the same folder.
The filesystem allows this, because one of them is decomposed and one is
composed. If the user had opened a text from the composed version, it
should be written back to the composed version. If the user had opened
it from the decomposed version it must be written back to the decomposed
version. Otherwise a completely unrelated file would simply be
overwritten, and the contents lost. (the same applies if the application
iterates through the directory content and compares file names. So here
the same compare version that would be used by the "Find dialog" must
behave different)
FPC can simply not know, if a string contains a file name, which must be
kept exactly as it, or a string contains some human readable text, which
would benefit from a "normalisation".
If you are going to put a compiler switch in front of each statement to
indicate the needs, you may as well change the statements. There is no
one statement for the whole application, as both of the above example
occur within a single application.
You could use two different UTF8Strings which behave different on
decomposed chars (I am *not* proposing this as a solution). But then you
can not just recompile your app by saying "string" now means UTF8String
throughout the whole application. You have again to go through all of
the source code and edit the app. So you may as well just go through the
sourcecode, and add the appropriate utf8-clean up calls to those part in
the code, that will need it.
In the end, switching an application to unicode means that within the
same app different parts are going to need different handling of unicode
(where no such difference existed for ascii/ansi). And no compiler can
figure out which part will need which behaviour.
More information about the fpc-devel
mailing list