[fpc-pascal] FileIO in FPC 3.0

Marco van de Voort marcov at stack.nl
Fri Sep 25 13:47:22 CEST 2015


In our previous episode, Andreas Dorn said:
> Is it safe to pass the Filename to procedures from the RTL without risking
> corruption?

In theory no, but since mostly filenames will be passed to functions that
have the same assumptions, I assume the only problem is if you put such a
filename in a store that has problems with it.

That being said, converting such strings to other codepages is not a good
idea.

The logical thing is to implement a simple validation function and call that
in critical points and throw an error if the input is inconsistent.
 
> For me this is more a general problem when dealing with external data.

Always assume external data needs to be validated.

> Should I tag raw external data as UTF-16/UTF-8 and be super-careful,
> or should I tag it as some kind of "raw" string (which one?) and handle
> any conversions manually.

I think this is only needed in special cases (like embedded devices that
sometimes spout garbage)

> For me Filenames are more a type that has some kind of affinity to an
> encoding for display, but I'd rather not tag its content as valid UTF-16
> - any magic internal conversion is potentially lossy.

Any passing to a subsystem that barfs on wrong surrogates causes trouble too
(e.g.  msxml or so).  The golden rule is to be liberal in what you accept
(and correct/reject appropiately) and exact in what you generate, and always
validate input.

So if you are really worried about this, validate when strings read from the
filesystem enter the your program (e.g.  result of findfirst), before
passing them on.

> All in all I think if everything works for filenames, everything else will
> follow...

Well, at least you can recycle the validation routines you made. A quick
look at the wikipedia utf16 article might give some clues.
 
> (Now lets better not start about the encoding of Filenames on non-Windows
> OS...  :-))

BSD/Linux afaik has the same problem. The filesystem is binary, not textual.
The textual aspect is only interpretation.



More information about the fpc-pascal mailing list