[fpc-devel] Save the current FPC UnicodeString!

Martin Schreiber fpmse at bluewin.ch
Thu Nov 12 18:13:43 CET 2009


On Tuesday 10 November 2009 10:33:07 Florian Klaempfl wrote:
> >
> > So please don't destroy this ideal solution by dropping current FPC
> > UnicodeString in favour of the Delphi string which is complicated,
>
> Who says that? If you don't mess with code pages, the only different
> you'll might see is that UnicodeString gets two new fields: encoding and
> char size. However, this information is usually only used if you pass
> the string to a RawString parameters. Normal Unicodestring routines
> initialize these fields and that's it.
>

I can confirm there is not much overhead for the new UnicodeString. I was 
mislead by the Delphi {$stringchecks on} option and a misinterpreted comment 
from a FPC developer that it is not possible to check codepage compatibility 
at compiletime, sorry for that.
Some guesswork gained form my experiments with the cpstrnew branch, Win32, 
Russian locale, source in utf-8, {$codepage utf8}, please correct me if I am 
wrong:

UnicodeString
- always utf-16 encoded.
- str:= 'abc'; length(str) = 6, stringcodepage(str) = 1200.
- str:= 'abä'; length(str) = 6, stringcodepage(str) = 1200.
- no encoding checks by concanteation, concatenation does not work because of 
wrong length() value.
- setlength() of empty string creates CP 1200.

UTF8String
- str:= 'abc'; length(str) = 3, stringcodepage(str) = 65001.
- str:= 'abä'; length(str) = 4, stringcodepage(str) = 65001.
  Runtime widestringmanager.Wide2AnsiMoveProc().
- encoding checked by concatenation.
- utf8string:= utf8string + '123' needs conversion to UnicodeString and back.
- setlength() of empty string creates CP 1251.

String<1251>
- str:= 'abc'; length(str) = 3, stringcodepage(str) = 65001.
- str:= 'abä'; length(str) = 3, stringcodepage(str) = 1251.
  Runtime widestringmanager.Wide2AnsiMoveProc().
- str:= 'abc'; str:= str + '123'; needs conversion to UnicodeString and back.
- setlength() of empty string creates CP 1251.

AnsiString
- str:= 'abc'; length(str) = 3, stringcodepage(str) = 65001.
- str:= 'abä'; length(str) = 0, stringcodepage(str) = 1251.
  Runtime widestringmanager.Wide2AnsiMoveProc().
- str:= 'abc'; str:= str + '123'; needs conversion to UnicodeString and back.
- setlength() of empty string creates CP 1251.

RawByteString
- str:= 'abc'; length(str) = 3, stringcodepage(str) = 65001.
- str:= 'abä'; length(str) = 0, stringcodepage(str) = 1251.
  Runtime widestringmanager.Wide2AnsiMoveProc().
- str:= 'abc'; str:= str + '123'; needs conversion to UnicodeString and back.
- setlength() of empty string creates CP 1251.

- utf8str1:= 'abc'; cp1251str1:= utf8str1; needs conversion to UnicodeString 
and back.
- utf8str1:= 'abc'; ansistr1:= utf8str1; no conversion. CP ansistr1 = 65001.
- ansistr1:= 'abc'; utf8str1:= ansistr1; no conversion. CP utf8str1 = 1251.

What are the differences of AnsiString and RawByteString?

Please report when you think cpstrnew branch is stable enough to be tested 
with MSEgui.

Thanks, Martin



More information about the fpc-devel mailing list