cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)

Tomas Hajny XHajT03 at mbox.vol.cz
Wed Nov 11 18:10:22 CET 2009


On Wed, November 11, 2009 16:31, Marco van de Voort wrote:
> In our previous episode, Tomas Hajny said:
>> >
>> > We might implement 1 or 2 of those. Most of them will however be
>> > handled by libiconv, the Windows code page conversion APIs, or some
>> > other external library (just like with the current widestring
>> manager).
>>
>> Nevertheless: is e.g. ISO 8859-2 character set referenced the same way
>> under different platforms (in the new concept), or would the new
>> codepage
>> number contain different values depending on the host platform? Does
>> libiconv allow referencing the character sets using some numeric
>> identifier at all? If yes, where are these identifiers defined? As an
>> example, MS Windows addresses ISO 8859-2 as codepage number 28592
>> whereas
>> OS/2 uses codepage number 912.
>
> Yes this is a problem. When I made the unicode document I thought about
> this
> too, and no solution is perfect. (using windows everywhere is strange for
> users, but you don't want to break Delphi per se)
>
> So I came up with a compromise (solution 3 below)
>
> There are three solutions:
>
> 1 delphi compatible, always use Windows encodings.
> 2 define a handful of constants that map to the encoding on the given
>   platform.  FPC_CODEPAGE_8859_2 =...
> 3 a mix of 1 and a handful of own constants:
>
> take a range of say 30-50 values that are free in the Windows range.
> Have a per platform table that maps these 50 values to native codepage
> numbers. The indexes into these table get nice names like in option 2.
>
> This way in the encoding translate routine you can do
>
> if (encoding>fpc_encoding_low) and (encoding<fpc_encoding_high) then
>    begin
>      encoding:=fpc_encodingtable[encoding-FPC_encoding_low]; // cheap
> lookup
>    end
> else
>    begin
>      encoding:=windowsencoding2nativeencoding[encoding];
>    end;
>
> Delphi users would only have to define the fpc constants of (2) to their
> respective windows codepages to keep the code working.

Well... How do you make sure that MS doesn't come with an extension of the
supported codepages in the next version of MS Windows (or that they don't
support a different list in some special version, like a version for the
Chinese market) breaking your selection of "50 free values in Windows
range"? How does your list of "50 values" compare to 280 lines provided by
(GNU) recode -l (presumably matching to high extent to values supported by
the underlying libiconv library)? Isn't it necessary to also keep the
character set names under Unix (as far as I understand it, at least
console character set information is provided using charset name provided
in an environment variable there)?

Tomas





More information about the fpc-devel mailing list