[fpc-devel] Trying to understand the wiki-Page "FPC Unicodesupport"
Hans-Peter Diettrich
DrDiettrich1 at aol.com
Sat Nov 29 18:43:33 CET 2014
Marco van de Voort schrieb:
> In our previous episode, Hans-Peter Diettrich said:
>>>> storage, we'll have to take that into account.
>>> (16-bit codepages were designed into OS/2 and Windows NT before utf-8 even
>>> existed)
>> Right, both systems were developed by Microsoft :-]
>
> A cooperation between IBM and Microsoft starting in 1984 to somewhere in the
> early nineties, yes. (or Micro Soft, I can't remember
> when they dropped the space).
AFAIK MS let pay IBM the bill for the OS/2 development, and used that
experience in the development of Windows (NT).
>> No problem, as long as proper host/network byteorder conversion is
>> applied in reading/writing such files.
>
> I don't see that as something evident.
It's evident in the case of reading/writing words on byte-based media,
where the byte order is important.
> crlf vs lf is not fully transparent
> either, just open an lf file with notepad. Many unix editors show crs etc.
> There isn't even an universal marker to signal it (like BOMs)
The handling of Carriage Return (CR) and Line Feed (LF) was essential on
mechanical (teletype-style) terminals. A Teletype terminal had no input
buffer, and couldn't perform an full Carriage Return within the
transmission time of the following code. That's why most protocols sent
a CR first, to start the carriage movement, followed by an LF, which was
processed before arrival of the next code. Both LF and CR had different
purposes, and could be used individually for special printing effects
(overwrite, form feed).
Newer devices (and computers) had no such timing requirements, so that a
single character code was sufficient to indicate a (logical)
end-of-line. Unfortunately some company used CR for that purpose, others
used LF, and MS used CR+LF as an EOL indicator. WRT to text output on
printing devices, the CR+LF convention certainly was the correct
solution. Problems arised only in data exchange between multiple
different systems, which had to cope with all three conventions. Unicode
provided no improvement, in contrary the same mess was continued with
de/composed accented characters and umlauts :-(
> Putting layer upon layer in a misguided attempt to make anything accept
> anything transparent is IMHO a waste of both time resources and computing. Better
> intensively maintain a few good converters, and strengthen metadata
> processing and retention to make it automatic in a few places where it
> really matters. I'm no security expert, but I guess from a security
> viewpoint that is better too.
I don't know about any text processing model really *superior* to
Unicode, do you?
And OOP is perfectly suited to implement multi-layer models.
DoDi
More information about the fpc-devel
mailing list