[fpc-devel] utf8 reading

DrDiettrich drdiettrich at compuserve.de
Fri Mar 11 08:12:32 CET 2005


Uberto Barbini wrote:

> Using natively utf-8 I think is impossible, because the encoding.

Support might be implmemented like/in MBCS support.

> Please note that at every Borland conference there is someone asking for
> Unicode support since Delphi2...

Not only for Delphi ;-)

> There are several opensource library for managing unicode strings in delphi
> but they are implemented as standard classes, not refcounted first class
> citizen as long-string.

It's not easy to find a solution suitable for everybody. There exist so
many character encodings, a single class or data type hardly will cover
all of them. Windows users may be happy with utf-16
(WideChar/WideString) because that's supported by the OS and some of its
standard controls, but other OS have different models and support.

As long as strings are created, used, and stored by an application, I'd
suggest to use utf-8 for the external (disk file) representation, and
WideString in the application. Then only two procedures are required to
convert between utf-8 and WideString. Strings from other sources then
have to be converted by the appropriate procedure into WideString, where
the coder is responsible for the selection of the appropriate
conversion; then a general library of such conversion procedures can be
created and maintained for use in Pascal programs, or the coder can use
his preferred opensource library.

Some coders may prefer WideString also in disk files, if utf-8 files
would be bigger in their natural or preferred language. Of course an
application also can continue to use AnsiString instead of WideString,
if Unicode support is not required. All these selections are up to the
coder, the required data types and conversions are already supported.

A compiler may support an Unicode switch, that maps the general data
types Char and String into either AnsiChar/AnsiString or
WideChar/WideString, in order to support easily portable code. The
switch may be extended to map WideChar into utf-16, utf-32, utf-64, or
whatever will become available in future compiler versions. A similar
effect can be achieved by user-defined data types TChar and TString in
portable code, with the possible problem that there exists no standard
unit where these data types can be defined unambiguously, throughout
whole projects.

DoDi






More information about the fpc-devel mailing list