[fpc-pascal] Warning not to use the "String" type with FPC 3.x

Mark Morgan Lloyd markMLl.fpc-pascal at telemetry.co.uk
Mon May 9 21:06:28 CEST 2016


Graeme Geldenhuys wrote:
> On 2016-05-09 17:40, Mark Morgan Lloyd wrote:> What, /exactly/, are you saying can be lost, and under what circumstances?
> You loose “data” due to codepage based AnsiString (aka the String type)not always supporting all code points of UTF8String or UnicodeString data.
> eg:  I write a program that assigns a UnicodeString value to an AnsiString  variable. My program uses compiler mode OBJFPC and {$H+}. I run that  same executable on two different Linux systems.  NOTE: it's the same executable.
>   Linux Box #1:    The default codepage is UTF-8, thus String equals AnsiString(65001).    No data is lost when converting from UnicodeString to String on this    system. Essentially it’s a conversion of UTF-16 to UTF-8 - both    support the full Unicode range.
>  Linux Box #1:   Here Linux has been setup with a default codepage of ISO-8859-1   (Latin 1). I have a UnicodeString variable which contains BMP and   Planes 1-12  code points. The program assigns that to my String   type variable,  which is actually AnsiString(<latin1>). Only the   first 255 characters of the 1.4 million Unicode code points will   be converted. All the others will be replaced by a '?' symbol. A   massive data loss, and that data could be critical.
> What does FPC do about this? It only gives you a compiler warning whenthe application was compiled, but still generates the executable as normal.
> I now fully understand why Delphi 2009 and later uses UnicodeString asthe default type and their String = UnicodeString = UTF-16. It defaultsto UTF-16 on all its supported platforms (granted, Delphi support a lotless platforms than FPC does). At least with Delphi it protects thedevelopers which still uses the String type everywhere (remember String= UnicodeString there). Much safer than what FPC 3.x does now!
> 
> Now some would say, simply switch your compiler mode to DelphiUnicode.But I don't want to do that, because I like the stricter ObjFPC mode,and prefer ObjFPC's syntax.

So which of these are you complaining about:

a) AnsiString doesn't support codepoints > 0xff ?

b) AnsiString doesn't support codepoints > 0x7f ?

c) AnsiString might apply an inappropriate translation for a codepoint 
<= 0x7f ?

-- 
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]



More information about the fpc-pascal mailing list