[fpc-devel] Possible issue with 2.7.1 string encodings

Martin lazarus at mfriebe.de
Sun Aug 25 14:44:34 CEST 2013

I suspect this to be an issue with the new 2.7.1 encoding. If someone 
could please review...

Some background fist.
I was looking into a report of a user, where the IDE (Lazarus) would not 
show the debug-line-info (blue dots in gutter), for some files (but work 
for others)
> fpc svn 25364
> lazarus svn 42490
> kubuntu 13.04,
from what I deduct, Turkish locale.

I then narrowed it down as follows:
- to get the blue dots, the filename for which it is needed, is stored 
in a stringlist
- then the info is added list.objects
- then the info is looked up list.IndexOf(filename)

The list is Sorted, and CaseSensitiv-False

*** The problem, index of returns -1 for strings that are in the list.

At one time even the following happened:
   if list.IndexOf(s) >= 0 then exit;
would give an exception: duplicates not allowed (I do seriously doubt, 
that the above code has much potential to be wrong)
- However that was no longer reproducible, so I collected evidence 

What I found debugging.

There is the following function

function TGDBMILineInfo.IndexOf(const ASource: String): integer;
   Result := FSourceIndex.IndexOf(ASource);
   if Result <> -1
   then Result := PtrInt(FSourceIndex.Objects[Result]);

Only the first line is of interest. It already returns -1 for existing 
strings, as far as I can tell.

To find some info I added debugln as follows.
Note the part
PInteger(ASource)[0],  // just some part of the string, for verification
PInteger(ASource)[1],  // on 2.7.1 Encoding ?  // on 2.6.2 length
PInteger(ASource)[-2] // on 2.7.1 length      // on 2.6.2 ref count.

function TGDBMILineInfo.IndexOf(const ASource: String): integer;
   i: Integer;
   Result := FSourceIndex.IndexOf(ASource);
   debugln(['TGDBMILineInfo.IndexOf (A)  res=', Result, ' ', ASource, ' 
// ',DbgStr(ASource),
      ', #',PInteger(ASource)[0],', #',PInteger(ASource)[1],', 
   for i := 0 to FSourceIndex.Count -1 do
    debugln(['TGDBMILineInfo.IndexOf (B) pos=', i, ' ', FSourceIndex[i], 
' // ',DbgStr(FSourceIndex[i]),
      ', #',PInteger(FSourceIndex[i])[0],', 
#',PInteger(FSourceIndex[i])[-1],', #',PInteger(FSourceIndex[i])[-2] ]);

   if Result <> -1
     then Result := PtrInt(FSourceIndex.Objects[Result]);
   debugln(['TGDBMILineInfo.IndexOf (C)  res=', Result, ' ', ASource, ' 
// ',DbgStr(ASource)]);

And the result

   TGDBMILineInfo.IndexOf (A)  res=-1 
/home/lazarus/projeler/TiB5651/Gunici_biriktir.inc // 
/home/lazarus/projeler/TiB5651/Gunici_biriktir.inc, #1836017711, 
#1634479973, #50
   TGDBMILineInfo.IndexOf (B) pos=0 
/home/lazarus/projeler/TiB5651/Gunici_biriktir.inc // 
/home/lazarus/projeler/TiB5651/Gunici_biriktir.inc, #1836017711, #0, #50
   TGDBMILineInfo.IndexOf (B) pos=1 
/home/lazarus/projeler/TiB5651/UGS_tib5651.lpr // 
/home/lazarus/projeler/TiB5651/UGS_tib5651.lpr, #1836017711, #0, #46
   TGDBMILineInfo.IndexOf (C)  res=-1 
/home/lazarus/projeler/TiB5651/Gunici_biriktir.inc // 

Result of
   Result := FSourceIndex.IndexOf(ASource);
is -1

but the string is on index  0

Only something changed its encoding. I have no idea what...

All strings come from one and the same variable in one and the same 
object. They are passed as parameters, or stored temporarily in other 
Fields (all variables and fields are "String" / all units have {$mode 
objfpc}{$H+}, some functions have "const name: string" )
Running on 2.6.2, shows that the string passed as argument to the above 
function, and the string in the list have the same ref-count 
(ref-count=10 ). This makes it very likely it is indeed the same string 
(not just same content)

So I have no idea, what could change the encoding. Since the string is 
(for all I can tell) NOT edited in anyway.

Most scary, that IndexOf and Add seem to have different opinion on 
string equality.

More information about the fpc-devel mailing list