[fpc-devel] LLVM Backend?

Wed Nov 11 20:43:08 CET 2009

On 11 Nov 2009, at 16:55, Samuel Crow wrote:

> ----- Original Message ----
>> From: Jonas Maebe <jonas.maebe at elis.ugent.be>
>> To: FPC developers' list <fpc-devel at lists.freepascal.org>
>> Sent: Wed, November 11, 2009 5:03:52 AM
>> Subject: Re: [fpc-devel] LLVM Backend?
>>
>> In a sense it's no problem if the LLVM backend doesn't support all  
>> targets that
>> the other backends support, after all it'll be just another choice  
>> that people
>> can use. I just want to avoid the LLVM backend being completely  
>> disconnected
>> from the other backends or require an inordinate amount of  
>> maintenance to keep
>> it in sync with the other backends, because in that case it would  
>> probably be
>> quite short-lived (unless some dedicated maintainer would step up  
>> and constantly
>> keep it in sync with the rest).
>
> What I originally had in mind was to use the LLVM support classes  
> based on C bindings for libstdc++ and LLVM's template libraries.   
> See http://llvm.org/docs/ProgrammersManual.html for LLVM's best  
> practices.  So, as such, it wouldn't be entirely disconnected from  
> the rest of the project.  It would just be another LLVM frontend.

I think we're talking about two different things here: I meant that it  
could get disconnected from the rest of the FPC project, not from the  
LLVM project.

> As for the code maintenance, the Mattathias BASIC team would be  
> using the code for our compiler in the future as well so we'd have  
> to protect it from bitrot.  Unfortunately something so drastic would  
> be like a fork of the project and I like the fact that FPC will work  
> on my old PowerMac G4 as well as the newer systems.  I was also  
> hoping to avoid having to write our compilers as GCC frontends as  
> well since that framework is poorly documented.

Not only that: a fork without any modifications in the "regular FPC"  
would probably be quite hard to keep in sync, because new frontend  
features still get added to FPC. Sometimes these also require  
(abstract, non-cpu-specific) code generator support, and if the  
regular FPC only keeps its current code generator, then a lot of those  
changes will require porting to be integrated the LLVM backend.

>> FPC already has a fairly abstract code generator interface, because  
>> we don't use
>> intermediate code: parse tree nodes are translated into machine  
>> code by calling
>> the appropriate code generator class methods. The existing abstract  
>> code
>> generator's interface is lower level than LLVM's as far as the type  
>> info is
>> concerned though, so we need a higher level (either sitting above  
>> the current
>> code generator, or replacing existing code generator).
>
> We may want to replace the existing code generator, unfortunately.   
> The reason that LLVM requires such high-level code for the calling  
> routines is that it supports not only different calling conventions  
> but it promotes from the stack-based C calling convention to a  
> register-loaded fastcall convention whenever possible.  This may not  
> have been necessary when using the Borland Fastcall convention but  
> sometimes being able to rearrange the registers when calling  
> different functions can avoid some spillage to the stack.

I agree that the current function call infrastructure in FPC's code  
generator is way over-engineered for what LLVM requires, and that part  
would probably not be used. An LLVM call instruction handles all of  
the parameter passing and return value handling by itself, so that is  
no problem. There are however other things where you do may want to  
keep the current code generator, because you'd have to implement  
support just as low level in LLVM as it currently exists in FPC.

One example is bitpacked arrays. Last I checked, LLVM did not support  
this yet (only bitpacked structs were functional), so you'd also need  
the explicit loading, masking, inserting and storing (unless you  
translate that into an array of structs with 1-byte elements  
consisting of 8 bits each, but then the debug info will suffer).

Another example is the Objective-Pascal support we've added: unless  
you'd build the FPC frontend on top of Clang rather than on top of  
LLVM, you'll have to re-implement all of the metadata generation for  
Objective-C classes in an LLVM way if you'd throw away all of FPC's  
low level code generation stuff.

> I think the biggest obstacles to using LLVM at this point is that  
> Borland Fastcall isn't supported by the x86 backend for LLVM and  
> that the type information isn't high-level enough in the FPC compiler.

The code generation in FPC is implemented in basically two steps. The  
parse tree nodes are all class instances with a "generate_code"  
method. The generic implementations of these methods use the various  
(generic) code generator methods to actually emit the (cpu-specific)  
assembler instructions. If you just create descendent classes for all  
of these parse tree nodes whose generate_code methods call into the  
LLVM run time rather than in FPC's abstract code generator, then you  
have access to all high level information that you could want.

This approach is what would correspond to the "replace the existing  
code generator" you mentioned. The big downside to this approach, as  
mentioned earlier, is that it would result in a lot of maintenance  
work to keep the LLVM version up-to-date, because newly added tree  
node types and changes to the regular FPC-versions of the  
generate_code methods of the FPC would have to be manually ported to  
the LLVM version (the language/dialects that FPC supports, and hence  
the parse tree classes and/or their implementation, are still evolving  
-- we're not just fixing implementation bugs or optimising).

Another approach would be to introduce a higher level code generator  
in FPC that can be implemented via either
a) the current low level code generator
b) an LLVM backend

In that case, new parse tree nodes etc would be implemented using this  
high level code generator, and no extra porting effort would be  
required for supporting both the LLVM backend and the regular code  
generators (once the initial implementation is finished, obviously).  
Since designing such a beast is hard and a lot of work, one  
possibility could be to model it on the interface that LLVM offers.  
Then we immediately know that it's complete and appropriate for  
compiler use (since LLVM was designed as a compiler backend), and that  
by definition it will fulfil the needs of the LLVM backend (and  
presumably also that of other high level backends, should anyone ever  
want to implement that).

Of course, that's probably quite a bit of work (and it would slow down  
the compiler due to the extra abstraction level), but I think it  
offers the best chances of avoiding a fork and lots of additional  
porting efforts afterwards.

> That's two problems, both fairly significant (although the latter is  
> definitely heavier than the former).  Do you think it's too soon to  
> divide out the work?  Is there more that needs discussion?

It's not entirely clear to me yet how you see the result: an FPC  
frontend added to the LLVM project, or an LLVM backend added to the  
FPC project. I favour the latter, but a lot of what you talk about  
seems to be about the former. Or am I misunderstanding things?

Jonas