[fpc-devel] RTL Unicode support

Andrew Brunner andrew.t.brunner at gmail.com
Fri Aug 24 14:43:23 CEST 2012


I've been keeping up with this topic for a while now and I haven't
read any suggestions similar to how I envision encoding support.

I think it's best to keep ansi strings intact.  I also think it's best
to create a string encoding class factory for people to draw upon for
conversions.

I don't think seamless conversion between types is presently required.
 I think looking at all the different technologies it would be wise
for FPC to just support the encoding of ANSI strings to match that of
popular ones.

A "featured" encoding system would include character sets for Internet
releated apps

            'UTF-8',
            'UTF-16',
            'UTF-16BE',
            'UTF-16LE',

            'ISO-8859-1',
            'ISO-8859-2',
            'ISO-8859-3',
            'ISO-8859-4',
            'ISO-8859-5',
            'ISO-8859-6',
            'ISO-8859-7',
            'ISO-8859-8',
            'ISO-8859-8_1',
            'ISO-8859-9',
            'ISO-8859-10',
            'ISO-8859-11',
            'ISO-8859-12',
            'ISO-8859-13',
            'ISO-8859-14',
            'ISO-8859-15',
            'ISO-8859-16',
            'ISO-2022-KR',
            'ISO-2022-JP',
            'ISO-2022-CN',

            'csISO_IR_111',

            'Windows-874',
            'Windows-1250',
            'Windows-1251',
            'Windows-1252',
            'Windows-1253',
            'Windows-1254',
            'Windows-1255',
            'Windows-1256',
            'Windows-1257',
            'Windows-1258',

            'EUC-KR',
            'EUC-JP',
            'EUC-TW',

            'TIS-620',
            'UHC',
            'JOHAB',
            'TCVN',

            'VPS',
            'CP-866',

            'ARMSCII-8',
            'USASCII',
            'VISCII',

            'HZ',
            'GBK',
            'Big5',
            'Big5_HKSCS',

            'GB2312',
            'GB18030',

            'KO18-R',
            'KO18-U',

            'IBM-850',
            'IBM-852',
            'IBM-855',
            'IBM-857',
            'IBM-864',
            'IBM-862',

            'MacCE',
            'MacRoman',
            'MacRomanian',
            'MacTurkish',
            'MacIcelandic',
            'Shift-JIS',
            'MacCyrillic',
            'MacCroatian',
            'MacDevanagari',
            'MacGurmukhi',
            'MacGujarati'

Going between these via (streams or memory), and some or most of these
would be the ideal.

I'm thinking - borrow your design from the Image class factory system.
 Using fpImage class system I can go from PNG to JPG with just
extensions and grab classes and create instances of converters there.

In the Internet app development realm we have blocks of text that we
already know what the encoding is supposed to be.  Take for example a
MP3 music file with ID3 Tag for a generic string.  It would be
declared as ANSI, UTF8 or UTF16.


Codec:=StringCodecFactory.getHandler(ANSI)
Codec:=StringCodecFactory.getHandler(UTF8)
Codec:=StringCodecFactory.getHandler(UTF16)
Codec:=StringCodecFactory.getHandler(UTF32)

Codec.Pos(..)
Codec.PosEx()
Codec.Replace()
Codec.Copy()
Codec.Delete()
Codec.Read(ContentType,Stream/string) overload;
Codec.Write(ContentType,Stream/string) overload;
Codec.AsString
Codec.Encode() // ie.) ansi or UTF8
Codec.Decode() // ie.) ISO-8859-1 will remove all the =20 and = for
word-wrapping etc.

Ideally, during design time, I could "case" all 3 types and just
reference the desired class for POS, Replace, PosEx.

The beneifts of isolating all the desired encoding types would be ease
of debugging, ease of growth, teams could target specific methods, and
if the factory return no encoding method it's not supported.

I'm at the point where I have extreme disparity codecs for various
forms of communications.  My ideal would be such a system.



More information about the fpc-devel mailing list