[fpc-devel] Unit loading overhead
Jonas Maebe
jonas.maebe at elis.ugent.be
Fri Jul 16 16:44:42 CEST 2010
Florian Klaempfl wrote on Fri, 16 Jul 2010:
> One of the bottlenecks the common user encounters, is unit loading:
> especially projects like the lazarus suffer from the time spent into
> unit loading while I suspect that it narrows down also to procedures
> like fillchar which consume a lot of time.
The main slowdown when recompiling projects is that FPC often
recompiles or re-resolves the same unit multiple times when a unit in
its uses clause has changed. The ppu loading itself is quite fast.
Recompiling Lazarus without changing any unit just takes 2.2 seconds
on my machine (without assembling/linking). Compiling program using
all units from the packages dir (910 units) takes 4.4 seconds (without
assembling/linking).
The following result is from compiling a program that uses 348
(precompiled) units from the packages tree on darwin/x86-64 and lists
all functions taking up 1% or more of the total execution time
(sample-based). I didn't use all units here because then my laptop
does not keep all ppu files in the disk cache during the profiling and
that obviously skews the results.
7.6% mach_kernel vm_map_enter
4.0% ppcx48 FPC_MOVE
3.9% ppcx48 CCLASSES_FPHASH$SHORTSTRING$$LONGWORD
3.6% mach_kernel blkclr
3.1% mach_kernel vm_map_lookup_entry
2.4% ppcx48 SYSTEM_SYSGETMEM_FIXED$QWORD$$POINTER
1.9% ppcx48 SYSTEM_SYSFREEMEM_FIXED$PFREELISTS$PMEMCHUNK_FIXED$$QWORD
1.8% mach_kernel ml_set_interrupts_enabled
1.7% ppcx48 SYSTEM_ALLOC_OSCHUNK$PFREELISTS$QWORD$QWORD$$POINTER
1.7% mach_kernel lo_alltraps
1.6% ppcx48 FPC_ANSISTR_DECR_REF
1.5% libSystem.B.dylib __bzero
1.4% ppcx48 SYSTEM_SYSFREEMEM$POINTER$$QWORD
1.4% ppcx48 fpc_pushexceptaddr
1.2% ppcx48 SYSTEM_REMOVE_FREED_FIXED_CHUNKS$POSCHUNK
1.1% ppcx48 CCLASSES_TDYNAMICARRAY_$__READ$formal$LONGWORD$$LONGWORD
1.1% ppcx48 SYMTYPE_TDEREF_$__RESOLVE$$TOBJECT
1.1% mach_kernel pmap_enter
1.1% ppcx48 fpc_popaddrstack
1.1% ppcx48 SYSTEM_TOBJECT_$__NEWINSTANCE$$TOBJECT
1.1% mach_kernel pmap_remove_range
1.0% mach_kernel cache_lookup_path
vmmap_enter is from mmap. This can be improved by increasing the
blocksize used to initialise pools for small blocks from 32Kb to 256Kb
(to support this for 32 bit systems, fixedoffsetshift in
rtl/inc/heap.inc has to be changed from 16 to 12, which is no problem
since only the 4 lowest bits are currently used for flags).
5.1% ppcx49 FPC_MOVE // source: 1.3% fpc_shortstr_to_shortstr, 1.1%
ppufile.readdata, 0.5% fpc_ansistr_copy
3.7% mach_kernel blkclr // kernel zeroing pages when we mmap memory
and it has no reserve zeroed pages
3.6% ppcx49 SYSTEM_SYSGETMEM_FIXED$QWORD$$POINTER
3.5% ppcx49 CCLASSES_FPHASH$SHORTSTRING$$LONGWORD
2.2% ppcx49 SYSTEM_SYSFREEMEM_FIXED$PFREELISTS$PMEMCHUNK_FIXED$$QWORD
2.1% libSystem.B.dylib __bzero // fillchar(0)
2.0% ppcx49 SYSTEM_REMOVE_FREED_FIXED_CHUNKS$POSCHUNK
1.9% ppcx49 SYSTEM_ALLOC_OSCHUNK$PFREELISTS$QWORD$QWORD$$POINTER
1.8% ppcx49 FPC_ANSISTR_DECR_REF
1.8% ppcx49 SYSTEM_SYSFREEMEM$POINTER$$QWORD
1.7% mach_kernel lo_alltraps
1.6% mach_kernel ml_set_interrupts_enabled
1.4% ppcx49 SYMTYPE_TDEREF_$__RESOLVE$$TOBJECT
1.4% mach_kernel pmap_enter // page fault
1.4% ppcx49 fpc_pushexceptaddr
1.4% ppcx49 SYSUTILS_COMPARETEXT$ANSISTRING$ANSISTRING$$LONGINT
1.2% mach_kernel pmap_remove_range // munmap
1.1% ppcx49 PPU_TPPUFILE_$__GETBYTE$$BYTE
1.1% ppcx49 CCLASSES_TFPHASHLIST_$__INTERNALFIND$LONGWORD$SHORTSTRING$LONGINT$$LONGINT
1.1% mach_kernel vm_page_lookup // page fault
1.1% ppcx49 SYSTEM_SETJMP$JMP_BUF$$LONGINT
1.0% ppcx49 FPC_MOVE
1.0% mach_kernel vm_map_enter // mmap
1.0% ppcx49 SYSTEM_TOBJECT_$__NEWINSTANCE$$TOBJECT
1.0% ppcx49 SYSTEM_SYSGETMEM_VAR$QWORD$$POINTER
1.0% ppcx49 fpc_varset_add_sets
1.0% ppcx49 FPC_SHORTSTR_COMPARE_EQUAL
1.0% ppcx49 fpc_ansistr_setlength
In real time (without assembling/linking):
Before After
user 0m1.621s user 0m1.636s
sys 0m0.791s sys 0m0.492s
Total memory usage barely changes (from 297MB to 299MB). I guess it's
no problem to commit this, but in most cases it probably won't change
much if anything performance-wise unless you do almost nothing but
allocate tons of small memory blocks without every freeing any in
between.
Jonas
----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.
More information about the fpc-devel
mailing list