[fpc-pascal] Warning not to use the "String" type with FPC 3.x
Mark Morgan Lloyd
markMLl.fpc-pascal at telemetry.co.uk
Mon May 9 21:06:28 CEST 2016
Graeme Geldenhuys wrote:
> On 2016-05-09 17:40, Mark Morgan Lloyd wrote:> What, /exactly/, are you saying can be lost, and under what circumstances?
> You loose “data” due to codepage based AnsiString (aka the String type)not always supporting all code points of UTF8String or UnicodeString data.
> eg: I write a program that assigns a UnicodeString value to an AnsiString variable. My program uses compiler mode OBJFPC and {$H+}. I run that same executable on two different Linux systems. NOTE: it's the same executable.
> Linux Box #1: The default codepage is UTF-8, thus String equals AnsiString(65001). No data is lost when converting from UnicodeString to String on this system. Essentially it’s a conversion of UTF-16 to UTF-8 - both support the full Unicode range.
> Linux Box #1: Here Linux has been setup with a default codepage of ISO-8859-1 (Latin 1). I have a UnicodeString variable which contains BMP and Planes 1-12 code points. The program assigns that to my String type variable, which is actually AnsiString(<latin1>). Only the first 255 characters of the 1.4 million Unicode code points will be converted. All the others will be replaced by a '?' symbol. A massive data loss, and that data could be critical.
> What does FPC do about this? It only gives you a compiler warning whenthe application was compiled, but still generates the executable as normal.
> I now fully understand why Delphi 2009 and later uses UnicodeString asthe default type and their String = UnicodeString = UTF-16. It defaultsto UTF-16 on all its supported platforms (granted, Delphi support a lotless platforms than FPC does). At least with Delphi it protects thedevelopers which still uses the String type everywhere (remember String= UnicodeString there). Much safer than what FPC 3.x does now!
>
> Now some would say, simply switch your compiler mode to DelphiUnicode.But I don't want to do that, because I like the stricter ObjFPC mode,and prefer ObjFPC's syntax.
So which of these are you complaining about:
a) AnsiString doesn't support codepoints > 0xff ?
b) AnsiString doesn't support codepoints > 0x7f ?
c) AnsiString might apply an inappropriate translation for a codepoint
<= 0x7f ?
--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk
[Opinions above are the author's, not those of his employers or colleagues]
More information about the fpc-pascal
mailing list