[fpc-devel] Possible issue with 2.7.1 string encodings
Martin
lazarus at mfriebe.de
Sun Aug 25 15:35:22 CEST 2013
On 25/08/2013 14:11, Paul Ishenin wrote:
> 25.08.13, 20:44, Martin пишет:
>
>> To find some info I added debugln as follows.
>> Note the part
>> PInteger(ASource)[0], // just some part of the string, for verification
>> PInteger(ASource)[1], // on 2.7.1 Encoding ? // on 2.6.2 length
argh 1 ... a mistake, no minus.
then anyway, the encoding would be before that...
>> PInteger(ASource)[-2] // on 2.7.1 length // on 2.6.2 ref count.
>
> Don't guess, just look at astrings.inc TAnsiRec.
You are right, I assumed wrong. But It does not make a big difference.
Something still goes wrong in TStringList.IndexOf
Hm yes. then there is something even more strange.
PAnsiRec = ^TAnsiRec;
TAnsiRec = Record
CodePage : TSystemCodePage;
ElementSize : Word;
{$ifdef CPU64}
{ align fields }
Dummy : DWord;
{$endif CPU64}
Ref : SizeInt;
Len : SizeInt;
end;
>
>> Only something changed its encoding. I have no idea what...
>
> Please look whether you have UTF8String type somewhere or you use a
> constant from a unit with BOM. In this case UTF-8 will be converted to
> DefaultSystemCodePage encoding.
That would only apply, if assigning to a string?
var a: string;
if a = 'foo' then ...
should not change the encoding of a?
----------------------------
In any case, I can not reproduce it on my system....
All I know is that at one time:
if list.IndexOf(s) >= 0 then exit;
list.Add(s);
gave a duplicate string exception.
I do not know, if it was related to the same issue...
---------------------------
Also the log shows the string is in the list
(And this is all local in one tiny procedure)
TGDBMILineInfo.IndexOf (A) res=-1
/home/lazarus/projeler/TiB5651/Gunici_biriktir.inc //
/home/lazarus/projeler/TiB5651/Gunici_biriktir.inc, #1836017711,
#1634479973, #50
TGDBMILineInfo.IndexOf (B) pos=0
/home/lazarus/projeler/TiB5651/Gunici_biriktir.inc //
/home/lazarus/projeler/TiB5651/Gunici_biriktir.inc, #1836017711, #0, #50
TGDBMILineInfo.IndexOf (B) pos=1
/home/lazarus/projeler/TiB5651/UGS_tib5651.lpr //
/home/lazarus/projeler/TiB5651/UGS_tib5651.lpr, #1836017711, #0, #46
TGDBMILineInfo.IndexOf (C) res=-1
/home/lazarus/projeler/TiB5651/Gunici_biriktir.inc //
/home/lazarus/projeler/TiB5651/Gunici_biriktir.inc
PInteger(ASource)[-2] is the ref count, and it is 50 for the string in
the list, and the local string for which I do the lookup.
SO by all likelihood the same string (and the probably has the same
encoding too / so here is a difference)
Yet stringlist returns -1 = not found.
And more:
- Changing StringList.CaseSensitive from FAlse to Strue, and all works ok
SO somewhere a sorted, none-case-sensitive stringlist has an issue with
IndexOf.
- It depends on the content of the string (and it does not happen on
every system)
So I still guess encoding related.
when none case sensitive the string compare must use the
widestringmanager, to deal with lettercase... But even if that is broken
on the particular system, using identical strings....
Also note, that the 2nd copy of the string on each line was done by
using "DbgStr()" which loops over all bytes, and replaces any byte
outside '.'..#126 by numeric representation
So the string contains nothing special. (maybe the "i" which I think in
Turkish there is an uppercase dotted i)
--------------------------------
Thats all I have.
More information about the fpc-devel
mailing list