[fpc-devel] Modernising Pascal

Ales Katona ales at chello.sk
Sat Feb 26 08:39:12 CET 2005


Jan Ruzicka  wrote / napĂ­sal (a):

> Enough guys
>
> each camp can make distinct implementation.
> Use this forum to discuss an interface.
>
> Let the results speak for themselves.
> Lets discuss test code.
> Lets discuss benchmark code.
>
> Instead of discussing bunch of what-ifs
> let's see how the implementation does.
>
> Is there some wiki[1] where we can collect mentioned ideas?
>
> Jan
>
> [1] (like www.twiki.org or http://c2.com/cgi/wiki)
>
> On Feb 25, 2005, at 13:11, Sebastian Kaliszewski wrote:
>
>> On Friday 25 February 2005 17:29, Eric Grange wrote:
>>
>>>> 1. This is not "entirely true" (the cost is at best logarithmic on the
>>>> number of objects or your allocator has terrible fragmentation)
>>>
>>>
>>> Not really, you can use a memory map and achieve much lower
>>> fragmentation than classic memory managers
>>
>>
>> 1. Memory map brings no quarantees of const time.
>> 2. Same applies to GC managers. BTW GC can use the very same
>> allocation/deallocation algorithm.
>>
>>> (cf. "FastMM" submissions
>>> in the FastCode project in the b.p.d.basm borland ng).
>>> In practice, you're memory usage will be a lot lower than that
>>> of a GC.
>>
>>
>> You're forgetting internal fragmentation. It typical allocator 
>> allocating
>> block with power of 2 sizes (4, 8, 16, 32, up to page size or small
>> multiply of page size) assuming flat size distribution you have ~33%
>> fragmentation.
>>
>>>
>>>> 2. You have to initialise your allocated data anyway.
>>>
>>>
>>> That's not a practical issue because:
>>> - initialization may be a task of the user's code, thus doesn't
>>>    fall to the memory manager
>>
>>
>> But must happen anyway.
>>
>>> - you can use virtual memory facilities and have the initialization
>>>    to zeroes happen automagically, no need to trash the CPU cache
>>>    by explicitly filling with zeroes.
>>
>>
>> No way. It does not. At best it's delayed up to the point when memory 
>> gets
>> used actually rather than being just allocated. But then OS has to clear
>> the page.
>>
>>>
>>>> 3. In real life often you have virtual memory system underneath and
>>>> there is no guarantee whatosoever.
>>>
>>>
>>> The management costs of the VM system are typically negligible
>>
>>
>> Neglibile but not const (they're mostly linear -- as memory must be 
>> cleared
>> before porcess might use it -- due to basic security requirements).
>>
>>> if you
>>> don't enter the realm of disk swapping, at which point the performance
>>> of GC schemes completely collapses during garbage collections
>>
>>
>> Modern generational GC's are less suspectible to such problems.
>>
>>> and
>>> compactions,
>>
>>
>> Not every GC is compacting GC.
>>
>>> while a classic allocator can live perfectly happily with
>>> part of its memory swapped out.
>>>
>>>> GC allocation (in case of compacting collecotr) is constant time
>>>> trivially,
>>>
>>>
>>> Allocation is alway constant time whatever the allocator if you do it
>>> right,
>>
>>
>> Nope. See above & below.
>> And even of you artificially restrict yourself to small span of small 
>> object
>> sizes yuo have to check quite a bunch of conditions, and conditional
>> branches are not free. It's allwayes order or two of magnitude slower 
>> than
>> adding constant to a heap pointer.
>>
>>
>>> that's not a comparison point, however, GC linear allocation isn't
>>> CPU-cache friendly, meaning that if you get that memory fast, the first
>>> actions you'll do on that memory will typically be an order of 
>>> magnitude
>>> slower than if you had been returned cached memory (particularly
>>> noticeable for allocations of a few hundreds of kB of temporaries, 
>>> string
>>> concatenations, growing arrays, etc.).
>>
>>
>> 1. ever heard of memory prefetch? You can do that explicitly, but most
>> current CPU uarchitectures does that implicitly for simple access 
>> pattersn
>> (and linear walki is one of the simplest).
>> 2. sorry but if you allocate few hundred of KB or more at once it 
>> won't be
>> cached anyway, regardless of the allocator.
>> 3. temporaries should come from stack anyway.
>>
>>
>>> Another drawback of linear allocation is interleaving, which often
>>> reduces the cache efficiency by placing heterogeneous data together
>>> (depends on the application of course, but it can hit you rather 
>>> severely
>>> on some composite data structures).
>>
>>
>> Sorry, but this goes the opposite way.  Randomly scattered 
>> allocations are
>> worse than using large compact block. Those, who have to optimise their
>> data for cache use know that, and allocate large memory pools and then
>> arrange data the way they want. Cache colloring and stuff is easier 
>> to do
>> when you do not have to deal with stuff scattered everywhere.
>>
>> This is one of the *advantages* of compacting allocators, that they 
>> alocate
>> with no internal fragmentation and that after compacting frequently used
>> parts are close together, and with just somwehat smart design of 
>> compaction
>> algorithm things used together are also placed together. This is 
>> certainly
>> good for cache not bad (it's obvious when you look at typical CPU cache
>> algorithm -- this minimises the amount of cache-line conflicts).
>>
>>
>>>
>>>> all the cost is in compacting which is linear.
>>>
>>>
>>> Compacting might be linear, but it's cache unfriendly, which on a 
>>> modern
>>> CPU means an order of magnitude slower memory operations, and that
>>> happens practically on the whole allocated memory.
>>
>>
>> Prefetch is your friend. Then GC might be made so that not every 
>> collection
>> has to hit entire memory. And collection is often incremental 
>> (interleaved
>> with user code execution).
>>
>>
>>> If some of it was
>>> swapped out, you can then expect hell (I've seen it happen), and as
>>> compaction is a necessity of linear allocation, you can never be 100%
>>> sure to avoid it with a linearly allocating GC.
>>
>>
>>
>> Well, I had once wait ~half a minute for one few hundred MB 
>> allocation to
>> finish (that was on 4BG RAM, 4 CPU Tru64 box with code compiled with 
>> one of
>> the best code-preformance-wise compilers around DEC CXX 6.x). This was
>> explicit allocation of course (so far with allocation in constant time).
>> So yes, shit happens.
>> The other thing is that especuially early Java implementations had 
>> primitive
>> & poor GC. Incremental & generational collector won't try to access 
>> entire
>> heap allmost at once.
>>
>>> Finally garbage collection isn't a linear phase, and it's also
>>> intrinsically not cache friendly (lots of random accesses all over the
>>> allocated memory, most of which are going to be cache misses).
>>>
>>> Nowadays, cache inefficiency can result in a lot of hidden perf hits,
>>> with uncached accesses being 10 or more times slower (from CPU ratios,
>>> RAM latencies, controler/chipset latencies, that all add to the CPU's
>>> own L1/L2 latencies).
>>
>>
>> Memory is 100-300 times slower latency wise than L1 cache.
>>
>>> 10 CPU cycles doesn't sound like much, but that's enough time for a
>>> dozen+ instructions under adequate circumstances,
>>
>>
>> Todays OutOfOrder CPUs can find a lot of useful work during such 
>> relatively
>> short (10 cycles is very little) periods of time -- btw even L2 cache 
>> hits
>> take longer (out of PC processors only PentiumM has 10 cycle L2 latency,
>> others are worse.
>>
>>
>>> have it happen too
>>> often, and that's enough to make the difference between smooth and
>>> sluggish.
>>
>>
>>
>> You didn't touch the other side. Non compacting allocation is either 
>> slow or
>> has significant internal fragmentation, especially heap allocating tiny
>> objects is very wasteful. You'll go out of small L1 cache quick while 
>> large
>> portions of your cache lines will contain just unused holes.
>>
>> rgds
>> -- 
>> Sebastian Kaliszewski
>>
>> _______________________________________________
>> fpc-devel maillist  -  fpc-devel at lists.freepascal.org
>> http://lists.freepascal.org/mailman/listinfo/fpc-devel
>>
>
>
> _______________________________________________
> fpc-devel maillist  -  fpc-devel at lists.freepascal.org
> http://lists.freepascal.org/mailman/listinfo/fpc-devel
>
I don't want to hit the bleeding wound but I think the point is:
NO GC.

Each of us has better things to do than try some GCs out which in the
end won't be used anyhow.

Ales




More information about the fpc-devel mailing list