[fpc-devel] Save the current FPC UnicodeString!
Martin Schreiber
fpmse at bluewin.ch
Thu Nov 12 18:13:43 CET 2009
On Tuesday 10 November 2009 10:33:07 Florian Klaempfl wrote:
> >
> > So please don't destroy this ideal solution by dropping current FPC
> > UnicodeString in favour of the Delphi string which is complicated,
>
> Who says that? If you don't mess with code pages, the only different
> you'll might see is that UnicodeString gets two new fields: encoding and
> char size. However, this information is usually only used if you pass
> the string to a RawString parameters. Normal Unicodestring routines
> initialize these fields and that's it.
>
I can confirm there is not much overhead for the new UnicodeString. I was
mislead by the Delphi {$stringchecks on} option and a misinterpreted comment
from a FPC developer that it is not possible to check codepage compatibility
at compiletime, sorry for that.
Some guesswork gained form my experiments with the cpstrnew branch, Win32,
Russian locale, source in utf-8, {$codepage utf8}, please correct me if I am
wrong:
UnicodeString
- always utf-16 encoded.
- str:= 'abc'; length(str) = 6, stringcodepage(str) = 1200.
- str:= 'abä'; length(str) = 6, stringcodepage(str) = 1200.
- no encoding checks by concanteation, concatenation does not work because of
wrong length() value.
- setlength() of empty string creates CP 1200.
UTF8String
- str:= 'abc'; length(str) = 3, stringcodepage(str) = 65001.
- str:= 'abä'; length(str) = 4, stringcodepage(str) = 65001.
Runtime widestringmanager.Wide2AnsiMoveProc().
- encoding checked by concatenation.
- utf8string:= utf8string + '123' needs conversion to UnicodeString and back.
- setlength() of empty string creates CP 1251.
String<1251>
- str:= 'abc'; length(str) = 3, stringcodepage(str) = 65001.
- str:= 'abä'; length(str) = 3, stringcodepage(str) = 1251.
Runtime widestringmanager.Wide2AnsiMoveProc().
- str:= 'abc'; str:= str + '123'; needs conversion to UnicodeString and back.
- setlength() of empty string creates CP 1251.
AnsiString
- str:= 'abc'; length(str) = 3, stringcodepage(str) = 65001.
- str:= 'abä'; length(str) = 0, stringcodepage(str) = 1251.
Runtime widestringmanager.Wide2AnsiMoveProc().
- str:= 'abc'; str:= str + '123'; needs conversion to UnicodeString and back.
- setlength() of empty string creates CP 1251.
RawByteString
- str:= 'abc'; length(str) = 3, stringcodepage(str) = 65001.
- str:= 'abä'; length(str) = 0, stringcodepage(str) = 1251.
Runtime widestringmanager.Wide2AnsiMoveProc().
- str:= 'abc'; str:= str + '123'; needs conversion to UnicodeString and back.
- setlength() of empty string creates CP 1251.
- utf8str1:= 'abc'; cp1251str1:= utf8str1; needs conversion to UnicodeString
and back.
- utf8str1:= 'abc'; ansistr1:= utf8str1; no conversion. CP ansistr1 = 65001.
- ansistr1:= 'abc'; utf8str1:= ansistr1; no conversion. CP utf8str1 = 1251.
What are the differences of AnsiString and RawByteString?
Please report when you think cpstrnew branch is stable enough to be tested
with MSEgui.
Thanks, Martin
More information about the fpc-devel
mailing list