[fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

Wed May 11 16:46:01 CEST 2016

Andreas Dorn wrote on Wed, 11 May 2016:

> All in all Graeme is right. FPC looks pretty much broken to me, too.
> For my projects I pulled the emergency-break on anything FPC.
>  
> The most serious flaws for me of FPC 3.0 are:
> - assuming that it's possible to assign an encoding to every string
> - using an (unsafe) guess about the encoding for auto-conversions

Do you have code that works correctly in FPC 2.6.x, but not in FPC  
3.0? If so, can you please post it or file bug reports? Again: the  
main focus when designing all of this new functionality was backward  
compatibility: existing code that uses plain  
string/shortstring/ansistring/unicodestring/char/widechar/unicodechar/pchar/pwidechar/punicodechar should have the same behaviour in FPC 3.0 as in previous FPC versions if you don't make any changes. And in virtually all cases it does (the utf8string type being a notable  
exception).

> Some examples:
> 1) String-Buffers
> Split a UTF-8 String into chunks of 1024 bytes. Trying to assign an
> encoding to
> those chunks, and allowing auto-conversions will just lead to corruption.
>  
> Where is the string-type for string-buffers gone?

There never was any, but as long as you don't try to convert strings  
containing such arbitrary data from one code page to another (by  
either calling setcodepage() or by assigning them from a string with  
declared code page X to a string with declared code page Y), no  
conversions will happen.

> 2) Most programming languages out there use something like "sequence of
> UTF-16 codepoints" as a string-type.
> (That's not the same as UTF-16 string !!!!!)
> It's a proper string type for "UTF-16 buffer" - pretty much nobody out
> there uses a low-level string-type that assumes
> that the content is a complete UTF-16 string.

The meaning of UnicodeString has not changed in FPC 3.0 compared to  
previous FPC versions, nor the way they are converted to/from other  
string types. You can argue it was broken from the start, but that's  
unrelated to the present animosity that's getting vented about FPC 3.0.

>  3) Filenames on Windows
> You can't convert any random filename on Windows to UTF8 and back without
> dataloss.
> There simply isn't any encoding that correctly fits to all possible
> filenames.

We only auto-convert Windows file names from UTF-16 to anything else  
if you use non-unicodestring/widestring variables with the file name  
APIs. If you consistently use unicodestring/widestring, no conversion  
will happen (except with not yet converted APIs, such as classes).

> A lot of APIs use buffers. You can try to assign an encoding to a buffer,
> but if you use that encoding
> to auto-convert anything you made a blatant mistake. Assuming that anything
> from the outside world
> (WindowsAPI, C#, Java...) is UTF-16 is yet another blatant mistake...

Maybe we should add support for "WTF-8" like in Rust:  
https://github.com/rust-lang/rust/issues/12056

> 4) some Barcodes,

I would not consider these to be strings, but other than that the same  
holds as for String Buffers above.

> 5) Various File-Format-Standards,

Idem.

> 6) anything that uses ASCII + some Control-Bytes for communication,

Idem.

> 7) some encodings used in databases, ...
> all that won't fit into the FCP scheme of 'known encodings'..
>  
>
> The most obvious showstoppers for FPC 3.0 are:
> FPC 3.0 doesn't have a useful type for string-buffers.

Use arrays, like in any other programming language. If you insist on  
using strings, simply stick to consistently using a single string type.

> FPC 3.0 doesn't have a useful type for Filenames

Use UnicodeString: as long as you do not assign it to another string  
type, it won't get converted.

> FPC 3.0 adds unsafe auto-conversions

Where/when?

Jonas