[fpc-other] What makes a Compiler project (like FPC) special?

Paul Robinson paul at paul-robinson.us
Sat May 27 06:59:12 CEST 2017


Graeme Geldenhuys asked in Vol 108, Issue 27, "What makes a compiler project special?"
Well, I'm not a member of the FPC but I've worked on several compilers and I'll throw in my 0.02 Euro into the discussion.
> Since Florian mentioned that a compiler project is "rocket science" [not his 
> direct words, but he hinted at that] and totally different to any other software 
> project... It has really bugged me... Why is it different, and What is different?
I'm going to have to disagree here, and it may simply display my own ignorance of the subject, but, then again, even a stopped clock is right twice a day.
A compiler is a "language processor," an application that converts code in one language into something else. If it's a translating compiler it converts it to another language. If it's a language compiler it converts it to binary code or potentially to assembly language. (I'm making a bit of a distinction in that a compiler that translates to assembly code isn't a "translator" because it is using the assembler to save some of the work in not "reinventing the wheel" and not having to create its own object file writer, and because compilers generating assembly are usually creating a finished output requiring no manual intervention. Most translators that change source from one (high level) language to another produce results that often require manual correction. Few translators produce "perfect" high-level to high-level conversions without some work. They'll do the "heavy lifting" but often minor "tweaks" or checking is required by the person.) 

At its core, a language processor is a text processing application. It takes a fixed combination of rules on what the programmer can and must "say" in order to specify the particular actions they want a program to accomplish. Given these rules, which are called "grammars" the programmer describes the program and the compiler takes that description and turns it into the target representation of that description.
In the case of a translator, it produces a new program in a different language. Or it may be the same language but converted to a different dialect, such as a translation from a different Pascal compiler, or a conversion from HP Cobol to IBM Mainframe Cobol, or conversion from C or Fortran to something newer.
Most language processors have gone to using parser generators in order to reduce the work involved in scanning a source language. Some may simply do language scanning directly. Most older Pascal compilers used "symbol substitution" in which as the language was scanned, it would create a symbol identifying what had been found. Whether it was an unrecognized word (which would indicate a user identifier), a symbol (like :, >, /, comma, etc) or a keyword (USES, UNIT, BEGIN, etc). Then the internal "current symbol" was set to the value of that symbol. 

Most compilers had about a 1 byte lookahead so that it could determine if it was a single byte symbol (comma, ^, or ' ) or a multibyte symbol which may be different depending on the second byte (> followed by an identifier vs. >=, : vs :=, < vs <> or <=). Okay, all of this was reasonable until object orientation came into use.
When one uses a variable, or a constant, which one are you using? Well, it depends on the "scope." If you have I defined in the main program (or in the definitions of a UNIT), your reference is to that one. If you're inside a procedure, function, method or other similar construct and you define I there, it uses that one/ But what if your program - or UNIT - calls several others UNITs each having a variable I defined, which one does it use? The first one? The last one?
Now, the plot thickens if you reference an object. An identifier in that object can be fixed or virtual. In which case, it may not be certain until execution time which one is being used, a variable or procedure in the base class or an overridden one in a descendant object. So a compiler has to read the tables in a unit in order to discover what items are visible and where they are in that unit, also to know what kind of variable (or procedure, or function) it is, and what is legal to do with it (can't add a 64-bit integer to an 8-bit unsigned byte because they're not compatible but you can do it the other way around.)

But this is still the translation of symbols and assigning them attributes including whether they are a standalone item (like a unit), a dependent item (like a variable in a program) or an internal item (like a field in a record or a member of an object.) It requires you keep information about these things but I don't think this is any worse than the work involved in a video game in holding state information about the game map, the player character (PC), non-player characters (NPCs), enemies, objects the player can hold (guns, Portal Device, radio) or the use or consume (money, ammo, health).
The last time I did a compile of the full compiler, it was on a reasonable machine maybe a year or two ago, was about 262,000 lines probably not including run-time libraries, and took an amazingly fast 13 seconds. In the end, it's still a text processor which attempts to take the explanation of what the programmer thinks the program is to do and translates it into a means to execute that explanation.
Even so, I'm sure it does not rise to the level of complexity of other types of applications involving other fields even if those programs are smaller in size. I suspect chemical analysis or actual programs involving real "rocket science" are considerably more complicated.
Let's put it at the level of a word processor, which might have to do a lot of similar things, such as process a document and redline the misspelled words, or even "compile" the formatted document into a PDF. But maybe that's too different a comparison as word processors do other things to documents. However I am trying to explain why a compiler application, while having some complexity, really isn't all that different from a typical "ordinary" application such as a word processor or other application most people deal with every day.
And is probably a lot less complex, too.
 
Paul 
Paul Robinson <paul at paul-robinson.us> - http://paul-robinson.us (My blog)
"The lessons of history teach us - if they teach us anything - that no one learns the lessons that history teaches us."

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-other/attachments/20170527/71fe5b26/attachment-0001.html>


More information about the fpc-other mailing list