[fpc-devel] TStringList.LoadFromFile and SavetoFile - file encoding support

Graeme Geldenhuys graemeg.lists at gmail.com
Tue Feb 3 07:44:24 CET 2009


Hi,

I just read all the comments about the following bug report in filed
under the Lazarus project.
  http://bugs.freepascal.org/view.php?id=12676

The comments posted doesn't seem sufficient to me.  If a user selects
a file to be loaded, they have no clue if that file is ANSI, UTF-8,
UTF-16 etc encoded. The suggestion by the Lazarus developers is to
ALWAYS assume the file is in UTF-8 (just because LCL uses UTF-8
internally) and to do a UTF8Encode on each line of the file. So what
happens if you do a .SavetoFile(...)?  Must you UTF8Decode each line
again??

This supposed solution fails horribly in practice. What if the file
was UTF-16 encoded? The load will fail as they assumed UTF-8 and I
don't even want to think what's going to happen if you save the file
again.  The bug reporter mentioned his file contained special German
characters like ü, ä, ö or ß and they were not handled appropriately.
Once loaded and saved, he couldn't load the file again.

I believe Delphi 2009 extended the .LoadFromFile(...) and
.SaveToFile(...) methods with an optional encoding parameter. Could
something like this be added to TStringList etc?  I guess we would
also need some auto encoding detection in place. How do other text
editors managed to auto detect the file encodings - to a degree of
accuracy?  Also, if the .LoadFromFile(...) and .SaveToFile(...)
methods were extended, then we (GUI toolkit developers) could extend
File Open and File Save dialogs like Qt has done. If the auto encoding
detection didn't work, the user can use the combobox in the file
open/save dialog to specify a encoding to use. Web Browsers have a
similar feature when displaying HTML.


Regards,
  - Graeme -


_______________________________________________
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/



More information about the fpc-devel mailing list