[fpc-devel] Modernising Pascal

Fri Feb 25 17:29:52 CET 2005

> 1. This is not "entirely true" (the cost is at best logarithmic on the 
> number of objects or your allocator has terrible fragmentation) 

Not really, you can use a memory map and achieve much lower
fragmentation than classic memory managers (cf. "FastMM" submissions
in the FastCode project in the b.p.d.basm borland ng).
In practice, you're memory usage will be a lot lower than that
of a GC.

> 2. You have to initialise your allocated data anyway.

That's not a practical issue because:
- initialization may be a task of the user's code, thus doesn't
   fall to the memory manager
- you can use virtual memory facilities and have the initialization
   to zeroes happen automagically, no need to trash the CPU cache
   by explicitly filling with zeroes.

> 3. In real life often you have virtual memory system underneath and there is 
> no guarantee whatosoever.

The management costs of the VM system are typically negligible if you
don't enter the realm of disk swapping, at which point the performance
of GC schemes completely collapses during garbage collections and compactions,
while a classic allocator can live perfectly happily with part of its
memory swapped out.

> GC allocation (in case of compacting collecotr) is constant time trivially, 

Allocation is alway constant time whatever the allocator if you do it right,
that's not a comparison point, however, GC linear allocation isn't CPU-cache
friendly, meaning that if you get that memory fast, the first actions you'll
do on that memory will typically be an order of magnitude slower than if you
had been returned cached memory (particularly noticeable for allocations of
a few hundreds of kB of temporaries, string concatenations, growing arrays, etc.).

Another drawback of linear allocation is interleaving, which often reduces
the cache efficiency by placing heterogeneous data together (depends on
the application of course, but it can hit you rather severely on some
composite data structures).

 > all the cost is in compacting which is linear.

Compacting might be linear, but it's cache unfriendly, which on a modern CPU
means an order of magnitude slower memory operations, and that happens
practically on the whole allocated memory. If some of it was swapped out,
you can then expect hell (I've seen it happen), and as compaction is a
necessity of linear allocation, you can never be 100% sure to avoid it
with a linearly allocating GC.

Finally garbage collection isn't a linear phase, and it's also intrinsically
not cache friendly (lots of random accesses all over the allocated memory,
most of which are going to be cache misses).

Nowadays, cache inefficiency can result in a lot of hidden perf hits,
with uncached accesses being 10 or more times slower (from CPU ratios,
RAM latencies, controler/chipset latencies, that all add to the CPU's
own L1/L2 latencies).
10 CPU cycles doesn't sound like much, but that's enough time for a dozen+
instructions under adequate circumstances, have it happen too often,
and that's enough to make the difference between smooth and sluggish.

Eric