[fpc-pascal] helpsystem, some numbers

Marco van de Voort marcov at stack.nl
Tue Oct 28 12:01:05 CET 2008


In our previous episode, Graeme Geldenhuys said:
> > In fact deflate/zip is 18-19 years old and there are lot of better
> > compression algorithm, like LZX. I think there is one implemented in Pascal
> > (ABC if memory don't fails).
> 
> I'm still trying to find a compression algorithm that beats whatever
> 7-zip uses. The results are by magnitudes smaller than any other
> compression algorithm I have seen.
> 
> The important thing for TZipFile component is that the archive format
> must compresses every file separately. Otherwise you can't extract a
> specific file without unpacking everything first.

But ZIP is 5-6 times larger than CHM, which can do all this too, and we have
the whole software shebang without deps.

I was somewhat surprised that bz2 was another 2 times smaller, and according
to Eduardo it is possible to extract blocks separately, without changing
compression parameters. Then you could index the tar+bz2 (which files in
which block + offset in block) by decompressing fully once, and then extract
single files.

Still, since that adds another index and handling, and a lot of work, chm is
working and not too bad.

> The other thing is the algorithms need to be free and supporting
> Unicode. 

A compression algorithm is not related to unicode. That's the job of the
archive component. 

>7-zip's LZMA does pass both requirements. I'm just not sure if it
> compresses filed separately - I would imagine it can/does.

A portable, not overly complex implementation in Pascal is also a
requirement IMHO. Not an hard one, but the fact that it is already there for
CHM makes it one for an alternative.



More information about the fpc-pascal mailing list