[fpc-devel] Interface to compressed files and archives

DrDiettrich drdiettrich at compuserve.de
Thu Dec 30 07:05:08 CET 2004


Hi there,

I'm new to this list and want to introduce myself and my intended
contributions to FreePascal.

My name is Dr. Hans-Peter Diettrich, and I live in Flensburg (Germany).
For brevity I use to sign my messages as DoDi. My main interests are
decompilers and (tools for) porting code. Usually I work with Delphi,
but this behaviour may change ;-)

Recently I came across some interesting library modules of FPC, that I
want to use in my own projects. Some of these modules deserve updates,
in general and for use with Delphi, and I want to contribute my
according work to the FPC community.

Currently I'm implementing an RPM clone for Windows, which in detail
should support source rpm's, better than the original RPM. Hereby I have
to deal with compressed files in various formats (gzip, bzip2), and
archive files (cpio, tar...).  I've already update or implemented some
of these modules, now I want to define a common interface and API for
compressed and archive streams, based on TStreams. The zstream unit is
dedicated to a single compressor, but it has an handy name. How should I
name a more general unit, would "zstreams" be acceptable?

My idea of a general (de-)compression interface is as follows:

In the general decompression unit a list of all available compressors is
maintained, every implemented and used compressor adds itself to this
list, in the initialization section of it's main unit.

Then a general Open or Decompress procedure can determine which
decompressor to use for an given stream, and can create the appropriate
decompressor object. For compressors it may be better to create the
according object directly, according to the desired compression format,
in which case the according arguments also can be passed to the
constructor of that class in the appropriate form.

The use of the de/compression stream objects should be obvious, Read or
Write is called until the EOF. The legacy C code of the compressors is
based on error codes and conditions that must be checked after almost
every call to an internal function, and which are available as the final
result after the information is fully processed. I want to modify that
model, so that errors will raise the predefined stream exceptions. This
approach will simplify, and make more transparent, the existing code as
well as the application code. It also will allow to hide the compressor
specific error codes from the application. Such a change will be
incompatible with the inherited decompressor API's, but does anybody see
a need to further support alternative and specialized access to
de/compressors, beyond the stream support?

If we can agree about the above details, I plan to convert the gzip,
bzip2 and zip modules to that common interface. I'm also willing to
update further modules for use of that interface, provided that the
modules already exist as Pascal source code.

---

Archive files deserve a more elaborate API, so that the files in an
archive can be extracted to individual files or streams. There was
already a suggestion, to define something like a virtual file system
interface for archive files. I suspect that something like this already
exists for use in the GUI browsers of both Linux and Windows. This may
deserve some research, before an accordingly compatible interface can be
defined. Now I'm waiting for according contributions from the OS gurus
before proceeding with this approach.

A much simpler interface could be based on enumeration and callback
procedures, that will allow to process existing archive files
sequentially. It also may be possible to create an directory tree for an
archive, but for now I will leave such an implementation to somebody
else ;-)
For the creation of new archive files, methods are required to add files
to the archive directory. The simplest approach will be based on
physical (existing) files, whose attributes can be retrieved by the
archiver from the existing file system. Then the application code must
not care about all related details.

---

Now you should have gotten the big picture of my intended activities.
Many more questions will arise when I proceed with my work. I already
decided to replace my own "stdc" unit by the FPC "libc" unit, with
hopefully no changes to that unit. For further compatibility it will be
necessary to find compromises between my coding style, and the style
used by the FPC community. E.g. I prefer to prefix all my units with an
"u", so that the base names remain available for procedures or
variables. I also use upper case characters in the unit names, what may
not be appreciated by users from the Unix world. As a compromise it may
be possible to use a "lib" prefix, but this may conflict with existing
library names (libz...). Any ideas?


I'll stop now and thank you for your patient reading. Feel free to
modify the subject or to open new threads for discussing details.

Happy New Year
  DoDi





More information about the fpc-devel mailing list