<div dir="ltr">So here's some more diagnostic at the point of the SEGV:<br><pre style="margin-left:40px">(gdb) disass
Dump of assembler code for function _$SYSTEM$_Ll1637:
=> 0x0118ace1 <+0>: cmpl $0x0,(%edx)
End of assembler dump.
(gdb) i reg
eax 0xb6c77158 -1228443304
ecx 0xb6c76c04 -1228444668
edx 0xfffffff8 -8
ebx 0x12adbf8 19586040
esp 0xb6c75f5c 0xb6c75f5c
ebp 0xb6c75f70 0xb6c75f70
esi 0xb6c77020 -1228443616
edi 0xb6c77020 -1228443616
eip 0x118ace1 0x118ace1 <_$SYSTEM$_Ll1637>
eflags 0x210293 [ CF AF SF IF RF ID ]
cs 0x73 115
ss 0x7b 123
ds 0x7b 123
es 0x7b 123
fs 0x0 0
gs 0x33 51
(gdb) p $eax^
$4 = 0<br></pre>This tells me that the test at the top of fpc_AnsiStr_Decr_Ref:<br><pre style="margin-left:40px"> cmpl $0,(%eax)
jne .Ldecr_ref_continue
ret
.Ldecr_ref_continue:</pre>passed (i.e. (%eax) was NOT nil) but sometime during the execution of the following code:<br><pre style="margin-left:40px">// Temps allocated between ebp-24 and ebp+0
subl $4,%esp
// Var S located in register
// Var l located in register
movl %eax,(%esp)
// [101] l:<a href="mailto:=@PAnsiRec">=@PAnsiRec</a>(S-FirstOff)^.Ref;
movl (%eax),%edx
subl $8,%edx
// [102] If l^<0 then exit;
cmpl $0,(%edx)</pre>the variable (%eax) MUST have been changed (to nil) BY ANOTHER THREAD.<br><div><div><div><div><div><div><br></div><div>Is there any other plausible explanation I may have missed?<br><br></div>
<div>If there is no other explanation, then it means I need to find out how the string variable referred to by (%eax) could have been been accessed (or even known to exist) by any other thread in the same address space.<br>
<br>If that variable is local to a function (i.e. foo's Result with SEGV upon its assignment immediately it first comes into scope, per my earlier email) then absent a bug in FPC's handling string references and allocation, it seems impossible that it could be known or referenced by any other other thread.<br>
<br>I'm reasonably confident there's no other way it could be overwritten by another thread (i.e. I don't think there are any range or buffer pointer errors anywhere else) so logic tells me I must have the wrong thesis or there's a string handling error in FPC.<br>
<br>Any clues or insight, gratefully received :-)<br><br></div>Cheers, Bruce.<br><div><br><div>PS: I can't use valgrind in practice for a variety of reasons, not
the least of which is that I'm not likely to see the error for an
extraordinary long time given that slight changes to the (execution time
of the) code made so far have had a dramatic effect on the likelihood of the occurrence of this problem at all but it's clearly some sort of race condition over unprotected memory somewhere.<br>
</div><br>
</div></div></div></div></div></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, May 9, 2013 at 9:47 AM, Bruce Tulloch <span dir="ltr"><<a href="mailto:pascal@causal.com" target="_blank">pascal@causal.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">I've not managed to trap it again, but based on the information I have from the last time it occurred I can say the error happened here:<br>
<div><br>--- a/rtl/i386/i386.inc<br>+++ b/rtl/i386/i386.inc<br>
@@ -1523,7 +1523,7 @@<br> movl (%eax),%edx<br> subl $8,%edx<br> // [102] If l^<0 then exit;<br> cmpl $0,(%edx) <-- SEGV OCCURS HERE<br> jl .Lj3596<br> .Lj3603:<br> // [104] If declocked(l^) then<br>
<br></div><div>That is, when testing the string length, the address of the length variable appears to be duff.<br></div><div><br>I don't know what %edx was pointing to at the time (I hope to know next time I trap it) but it was obviously wrong.<span class="HOEnZb"><font color="#888888"><br>
</font></span></div><span class="HOEnZb"><font color="#888888"><div><br></div><div>-b<br></div></font></span></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, May 9, 2013 at 9:33 AM, Bruce Tulloch <span dir="ltr"><<a href="mailto:pascal@causal.com" target="_blank">pascal@causal.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div>Thanks Jonas, that confirms what I suspected. Next time I trap an instance of this (rare) fault I will inspect exactly which CPU instruction raised the SEGV inside fpc_AnsiStr_Decr_Ref in search of a source of memory corruption.<span><font color="#888888"><br>
<br></font></span></div></div></div><span><font color="#888888"><div><br></div>Bruce.<br></font></span></div><div><div><div class="gmail_extra"><br><br><div class="gmail_quote">On Wed, May 8, 2013 at 11:49 PM, Jonas Maebe <span dir="ltr"><<a href="mailto:jonas.maebe@elis.ugent.be" target="_blank">jonas.maebe@elis.ugent.be</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><br>
On 08 May 2013, at 08:13, Bruce Tulloch wrote:<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
After a random but very long period of time (i.e. very many successful<br>
calls) I get a SEGV in the built-in function fpc_AnsiStr_Decr_Ref.<br>
<br>
GDB reports the argument to fpc_AnsiStr_Decr_Ref (the string who's<br>
reference is to be decremented) is nil (i.e. 0x0).<br>
<br>
Prima facie, that's the reason for the SEGV, but how is it possible that<br>
the compiler would pass a nil pointer to this function the first place?<br>
</blockquote>
<br></div>
The first thing fpc_AnsiStr_Decr_Ref does is check whether its parameter is nil, and if so it immediately exists. It can be nil in case the ansistring contains an empty string.<br>
<br>
That routine itself also sets its argument to nil in case this was not the case initially (it's a var-parameter), and I assume your crash happens after this has been done.<div><br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
To put this into context, I'm running FPC 2.6.2 on a 32 bit Linux system<br>
executing in a multi-threaded application (which uses python threads and<br>
fpc threads). I have not found obvious evidence of memory corruption from<br>
other execution contexts or shared memory handling problems.<br>
</blockquote>
<br></div>
It's nevertheless most likely memory corruption. You can try compiling with -gv and running your program under valgrind to see whether it finds anything (you will probably get some false positives about certain RTL pchar routines such as strscan and strlen, but you can ignore those).<br>
<br>
<br>
Jonas<br>
______________________________<u></u>_________________<br>
fpc-pascal maillist - <a href="mailto:fpc-pascal@lists.freepascal.org" target="_blank">fpc-pascal@lists.freepascal.<u></u>org</a><br>
<a href="http://lists.freepascal.org/mailman/listinfo/fpc-pascal" target="_blank">http://lists.freepascal.org/<u></u>mailman/listinfo/fpc-pascal</a><br>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>