[fpc-devel] Parallel processing in the compiler

Mon Sep 6 05:22:32 CEST 2010

Florian Klämpfl schrieb:

>> Right, that's how it *should* be designed. But try to find out why the
>> code generation is added, when variables like current_filepos or
>> current_tokenpos are moved into TModule (current_module) :-(
> 
> Why should current_filepos and current_tokenpos go into TModule? They
> can be perfectly threadvars.

Good point. So would it be sufficient to retype all such variables as 
threadvar, in order to make parallel processing (threads) possible?

I have no real idea, how in such a model the initialization of the 
threadvars has to be implemented. That's why I try to assign all 
state-related variables to a definite object, whose reference can be 
easily copied (or moved?) into any created thread.

Then we also have to draw a border, between possible parallel and still 
required sequential processing. Sequential processing must be used for 
all output, be binary or log files. Also encountered errors must be 
reported from the originating thread to all related (main and other) 
threads, when they require to stop and shut down compilation in an 
orderly manner.

> Further, I don't see them fitting into
> TModule, they describe not the state of a module but are part of the
> compilation state. Even more, consider several threads compiling
> different procedure of one module: putting current_filepos and
> current_tokenpos into TModule won't work in this case.

Right, but I see no chance for such parallelism, before all related 
variables have been found. See my questions about just these variables, 
and the according tfileposinfo values in several objects.

Parallel code generation requires that the cg is separated from parsing, 
so that the next procedure can be parsed while the previously parsed 
procedures are compiled.

>> The last change was to remove ppudump from the Makefile, and this proved
>> that so far only ppudump is sensitive to changes in the compiler
> internals.
> 
> Guess why I'am sceptical that it's usefull to use the compiler parser
> for other purposes like code tools or documentation: probably once per
> week a simple compiler change breaks this external usage (we made this
> experience ten years ago ;) ).

I've postponed that initial motivation, to the end of all other 
refactoring. Apart from parallelism I see more chances for the 
introduction of really new features in other places, like multiple 
front-ends. Such projects require a separation of the mere parser from 
the rest of the infrastructure, i.e. the handling of all symbols, 
creation of nodes, etc. have to be moved into new and commonly usable 
interfaces. After that step it also would be easy, and could break 
nothing in the compiler, when a no-cpu target is added to the target 
specific back-ends.

> How we can continue? I'll see if I find within the next week time (I
> were on holiday for one week) to review the noglobals changes and how we
> can split them into usable parts.

IMO the most important decision is about the general direction of the 
refactoring. Do we want more OO (encapsulation), more codegen 
separation, or what else. IMO encapsulation is the most useful first 
step, towards any other goal. The current compiler "structure" is 
dictated by purely *formal* aspects (unit dependencies), and does not 
reflect the *logical* dependencies between objects, variables, 
procedures etc. This lack of logical structure, next lack of up-to-date 
documentation, is the most annoying problem with *every* compiler 
enhancement attempt.

DoDi