[fpc-devel] Blackfin support

Tue Jul 13 18:19:04 CEST 2010

In our previous episode, Hans-Peter Diettrich said:
> > No that has to be solved by a bigger granularity (compiling more units in
> > one go).  That avoids ppu reloading and limits directory searching (there is
> > a cache iirc) freeing up more bandwidth for source loading.
> 
> ACK. The compiler should process in one go as many units as possible - 
> but this is more a matter of the framework (Make, Lazarus...), that 
> should pass complete lists of units to the compiler (projects, packages).

Not necessarily. One could also strengthen the make capabilities of the
compiler, think about reworking the  compiler to be kept resident etc.

> As a workaround a dedicated server process could hold the least recently 
> processed unit objects in RAM, for use in immediately following 
> compilation of other units. But this would only cure the symptoms, not 
> the reason for slow compiles :-(

(some random wild thinking:)

Jonas seems to indicate most is due to the object model (zeroing) and
memorymanagement in general.

One must keep in mind though that he probably measures on a *nix, and there
is a reason why on Windows the make cycle takes twice the time on Windows. I
don't think under Windows, the CPU or the cache halves in speed, so it must
be more in the I/O sphere:
- ntfs is relatively slow in directory operations (seeking)
- Windows is slow starting up binaries.
- Afaik ntfs caching is optimized for fileserver use, not to speed up a 
   single application strongly. Specially if that apps starts/stops
   constantly (a model that is foreign on Windows)

So one can't entirely rule out limiting I/O and number of compiler startups,
since not all OSes are alike.

For the memory management issues, an memory manager specifically for the
compiler is the solution first hand. To make it worthwhile to have a list of
zeroed blocks (and have a thread zero big blocks), somehow the system
must know when a zeroed block is needed. For objects this maybe could be by
creating a new root object, and deriving every object from it (cclasses
etc). But that would still leave dynamic arrays and manually allocated
memory.

For manually allocated memory of always the same size (virtual register
map?) a pooling solution could be found.

> It may be a good idea to implement different models, that either read 
> entire files or use the current (buffered) access. Depending on disk 
> fragmentation it may be faster to read entire (unfragmented) source or 
> ppu files, before requests for other files can cause disk seeks and slow 
> down continued reading of files from other places. Both models can be 
> used concurrently, when an arbitration is possible from certain system 
> (load) parameters.

Most OSes already read several 10s of kbs in advance. I don't really think
that will bring that much. Such approaches are so lowlevel that the OS could
do it, and probably it will.