[fpc-devel] Unit loading overhead

Jonas Maebe jonas.maebe at elis.ugent.be
Fri Jul 16 16:44:42 CEST 2010


Florian Klaempfl wrote on Fri, 16 Jul 2010:

> One of the bottlenecks the common user encounters, is unit loading:
> especially projects like the lazarus suffer from the time spent into
> unit loading while I suspect that it narrows down also to procedures
> like fillchar which consume a lot of time.

The main slowdown when recompiling projects is that FPC often  
recompiles or re-resolves the same unit multiple times when a unit in  
its uses clause has changed. The ppu loading itself is quite fast.  
Recompiling Lazarus without changing any unit just takes 2.2 seconds  
on my machine (without assembling/linking). Compiling program using  
all units from the packages dir (910 units) takes 4.4 seconds (without  
assembling/linking).

The following result is from compiling a program that uses 348  
(precompiled) units from the packages tree on darwin/x86-64 and lists  
all functions taking up 1% or more of the total execution time  
(sample-based). I didn't use all units here because then my laptop  
does not keep all ppu files in the disk cache during the profiling and  
that obviously skews the results.

7.6%	mach_kernel	vm_map_enter
4.0%	ppcx48	FPC_MOVE
3.9%	ppcx48	CCLASSES_FPHASH$SHORTSTRING$$LONGWORD
3.6%	mach_kernel	blkclr
3.1%	mach_kernel	vm_map_lookup_entry
2.4%	ppcx48	SYSTEM_SYSGETMEM_FIXED$QWORD$$POINTER
1.9%	ppcx48	SYSTEM_SYSFREEMEM_FIXED$PFREELISTS$PMEMCHUNK_FIXED$$QWORD
1.8%	mach_kernel	ml_set_interrupts_enabled
1.7%	ppcx48	SYSTEM_ALLOC_OSCHUNK$PFREELISTS$QWORD$QWORD$$POINTER
1.7%	mach_kernel	lo_alltraps
1.6%	ppcx48	FPC_ANSISTR_DECR_REF
1.5%	libSystem.B.dylib	__bzero
1.4%	ppcx48	SYSTEM_SYSFREEMEM$POINTER$$QWORD
1.4%	ppcx48	fpc_pushexceptaddr
1.2%	ppcx48	SYSTEM_REMOVE_FREED_FIXED_CHUNKS$POSCHUNK
1.1%	ppcx48	CCLASSES_TDYNAMICARRAY_$__READ$formal$LONGWORD$$LONGWORD
1.1%	ppcx48	SYMTYPE_TDEREF_$__RESOLVE$$TOBJECT
1.1%	mach_kernel	pmap_enter
1.1%	ppcx48	fpc_popaddrstack
1.1%	ppcx48	SYSTEM_TOBJECT_$__NEWINSTANCE$$TOBJECT
1.1%	mach_kernel	pmap_remove_range
1.0%	mach_kernel	cache_lookup_path

vmmap_enter is from mmap. This can be improved by increasing the  
blocksize used to initialise pools for small blocks from 32Kb to 256Kb  
(to support this for 32 bit systems, fixedoffsetshift in  
rtl/inc/heap.inc has to be changed from 16 to 12, which is no problem  
since only the 4 lowest bits are currently used for flags).

5.1%	ppcx49	FPC_MOVE  // source: 1.3% fpc_shortstr_to_shortstr, 1.1%  
ppufile.readdata, 0.5% fpc_ansistr_copy
3.7%	mach_kernel	blkclr  // kernel zeroing pages when we mmap memory  
and it has no reserve zeroed pages
3.6%	ppcx49	SYSTEM_SYSGETMEM_FIXED$QWORD$$POINTER
3.5%	ppcx49	CCLASSES_FPHASH$SHORTSTRING$$LONGWORD
2.2%	ppcx49	SYSTEM_SYSFREEMEM_FIXED$PFREELISTS$PMEMCHUNK_FIXED$$QWORD
2.1%	libSystem.B.dylib	__bzero  // fillchar(0)
2.0%	ppcx49	SYSTEM_REMOVE_FREED_FIXED_CHUNKS$POSCHUNK
1.9%	ppcx49	SYSTEM_ALLOC_OSCHUNK$PFREELISTS$QWORD$QWORD$$POINTER
1.8%	ppcx49	FPC_ANSISTR_DECR_REF
1.8%	ppcx49	SYSTEM_SYSFREEMEM$POINTER$$QWORD
1.7%	mach_kernel	lo_alltraps
1.6%	mach_kernel	ml_set_interrupts_enabled
1.4%	ppcx49	SYMTYPE_TDEREF_$__RESOLVE$$TOBJECT
1.4%	mach_kernel	pmap_enter // page fault
1.4%	ppcx49	fpc_pushexceptaddr
1.4%	ppcx49	SYSUTILS_COMPARETEXT$ANSISTRING$ANSISTRING$$LONGINT
1.2%	mach_kernel	pmap_remove_range // munmap
1.1%	ppcx49	PPU_TPPUFILE_$__GETBYTE$$BYTE
1.1%	ppcx49	CCLASSES_TFPHASHLIST_$__INTERNALFIND$LONGWORD$SHORTSTRING$LONGINT$$LONGINT
1.1%	mach_kernel	vm_page_lookup // page fault
1.1%	ppcx49	SYSTEM_SETJMP$JMP_BUF$$LONGINT
1.0%	ppcx49	FPC_MOVE
1.0%	mach_kernel	vm_map_enter // mmap
1.0%	ppcx49	SYSTEM_TOBJECT_$__NEWINSTANCE$$TOBJECT
1.0%	ppcx49	SYSTEM_SYSGETMEM_VAR$QWORD$$POINTER
1.0%	ppcx49	fpc_varset_add_sets
1.0%	ppcx49	FPC_SHORTSTR_COMPARE_EQUAL
1.0%	ppcx49	fpc_ansistr_setlength

In real time (without assembling/linking):

Before               After
user	0m1.621s     user	0m1.636s
sys	0m0.791s     sys	0m0.492s

Total memory usage barely changes (from 297MB to 299MB). I guess it's  
no problem to commit this, but in most cases it probably won't change  
much if anything performance-wise unless you do almost nothing but  
allocate tons of small memory blocks without every freeing any in  
between.


Jonas

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.




More information about the fpc-devel mailing list