[fpc-devel] Trying to understand the wiki-Page "FPC Unicodesupport"

Sat Nov 29 18:43:33 CET 2014

Marco van de Voort schrieb:
> In our previous episode, Hans-Peter Diettrich said:
>>>> storage, we'll have to take that into account.
>>> (16-bit codepages were designed into OS/2 and Windows NT before utf-8 even
>>> existed)
>> Right, both systems were developed by Microsoft :-]
> 
> A cooperation between IBM and Microsoft starting in 1984 to somewhere in the
> early nineties, yes. (or Micro Soft, I can't remember
> when they dropped the space).

AFAIK MS let pay IBM the bill for the OS/2 development, and used that 
experience in the development of Windows (NT).

>> No problem, as long as proper host/network byteorder conversion is 
>> applied in reading/writing such files. 
> 
> I don't see that as something evident.

It's evident in the case of reading/writing words on byte-based media, 
where the byte order is important.

> crlf vs lf is not fully transparent
> either, just open an lf file with notepad. Many unix editors show crs etc.
> There isn't even an universal marker to signal it (like BOMs)

The handling of Carriage Return (CR) and Line Feed (LF) was essential on 
mechanical (teletype-style) terminals. A Teletype terminal had no input 
buffer, and couldn't perform an full Carriage Return within the 
transmission time of the following code. That's why most protocols sent 
a CR first, to start the carriage movement, followed by an LF, which was 
processed before arrival of the next code. Both LF and CR had different 
purposes, and could be used individually for special printing effects 
(overwrite, form feed).

Newer devices (and computers) had no such timing requirements, so that a 
single character code was sufficient to indicate a (logical) 
end-of-line. Unfortunately some company used CR for that purpose, others 
used LF, and MS used CR+LF as an EOL indicator. WRT to text output on 
printing devices, the CR+LF convention certainly was the correct 
solution. Problems arised only in data exchange between multiple 
different systems, which had to cope with all three conventions. Unicode 
provided no improvement, in contrary the same mess was continued with 
de/composed accented characters and umlauts :-(

> Putting layer upon layer in a misguided attempt to make anything accept
> anything transparent is IMHO a waste of both time resources and computing.  Better
> intensively maintain a few good converters, and strengthen metadata
> processing and retention to make it automatic in a few places where it
> really matters. I'm no security expert, but I guess from a security
> viewpoint that is better too.

I don't know about any text processing model really *superior* to 
Unicode, do you?

And OOP is perfectly suited to implement multi-layer models.

DoDi