[fpc-devel] Possible issue with 2.7.1 string encodings

Martin lazarus at mfriebe.de
Sun Aug 25 15:35:22 CEST 2013


On 25/08/2013 14:11, Paul Ishenin wrote:
> 25.08.13, 20:44, Martin пишет:
>
>> To find some info I added debugln as follows.
>> Note the part
>> PInteger(ASource)[0],  // just some part of the string, for verification
>> PInteger(ASource)[1],  // on 2.7.1 Encoding ?  // on 2.6.2 length

argh 1 ... a mistake, no minus.
then anyway, the encoding would be before that...

>> PInteger(ASource)[-2] // on 2.7.1 length      // on 2.6.2 ref count.
>
> Don't guess, just look at astrings.inc TAnsiRec.
You are right, I assumed wrong. But It does not make a big difference. 
Something still goes wrong in TStringList.IndexOf


Hm yes. then there is something even more strange.
   PAnsiRec = ^TAnsiRec;
   TAnsiRec = Record
     CodePage    : TSystemCodePage;
     ElementSize : Word;
{$ifdef CPU64}
     { align fields  }
     Dummy       : DWord;
{$endif CPU64}
     Ref         : SizeInt;
     Len         : SizeInt;
   end;

>
>> Only something changed its encoding. I have no idea what...
>
> Please look whether you have UTF8String type somewhere or you use a 
> constant from a unit with BOM. In this case UTF-8 will be converted to 
> DefaultSystemCodePage encoding.

That would only apply, if assigning to a string?
var a: string;
   if a = 'foo' then ...
should not change the encoding of a?

----------------------------
In any case, I can not reproduce it on my system....

All I know is that at one time:
   if list.IndexOf(s) >= 0 then exit;
   list.Add(s);
gave a duplicate string exception.

I do not know, if it was related to the same issue...

---------------------------
Also the log shows the string is in the list
(And this is all local in one tiny procedure)

   TGDBMILineInfo.IndexOf (A)  res=-1 
/home/lazarus/projeler/TiB5651/Gunici_biriktir.inc // 
/home/lazarus/projeler/TiB5651/Gunici_biriktir.inc, #1836017711, 
#1634479973, #50
   TGDBMILineInfo.IndexOf (B) pos=0 
/home/lazarus/projeler/TiB5651/Gunici_biriktir.inc // 
/home/lazarus/projeler/TiB5651/Gunici_biriktir.inc, #1836017711, #0, #50
   TGDBMILineInfo.IndexOf (B) pos=1 
/home/lazarus/projeler/TiB5651/UGS_tib5651.lpr // 
/home/lazarus/projeler/TiB5651/UGS_tib5651.lpr, #1836017711, #0, #46
   TGDBMILineInfo.IndexOf (C)  res=-1 
/home/lazarus/projeler/TiB5651/Gunici_biriktir.inc // 
/home/lazarus/projeler/TiB5651/Gunici_biriktir.inc

PInteger(ASource)[-2]  is the ref count, and it is 50 for the string in 
the list, and the local string for which I do the lookup.

SO by all likelihood the same string (and the probably has the same 
encoding too / so here is a difference)
  Yet stringlist returns -1 = not found.

And more:
- Changing StringList.CaseSensitive from FAlse to Strue, and all works ok
SO somewhere a sorted, none-case-sensitive stringlist has an issue with 
IndexOf.

- It depends on the content of the string  (and it does not happen on 
every system)

So I still guess encoding related.

when none case sensitive the string compare must use the 
widestringmanager, to deal with lettercase... But even if that is broken 
on the particular system, using identical strings....

Also note, that the 2nd copy of the string on each line was done by 
using "DbgStr()" which loops over all bytes, and replaces any byte 
outside '.'..#126 by numeric representation
So the string contains nothing special. (maybe the "i" which I think in 
Turkish there is an uppercase dotted i)

--------------------------------

Thats all I have.




More information about the fpc-devel mailing list