[fpc-devel] TStringList.LoadFromFile and SavetoFile - file encoding support

Vincent Snijders vsnijders at vodafonevast.nl
Tue Feb 3 09:27:32 CET 2009


Graeme Geldenhuys schreef:
> On Tue, Feb 3, 2009 at 9:02 AM, Vincent Snijders
> <vsnijders at vodafonevast.nl> wrote:
>> I am a Lazarus developer, and I don't think I said it like that.
> 
> I wasn't pointing fingers to you Vincent. :-) I summarized what a few
> people have said.
> 
>> LoadFromFile in a LCL control, you need to make sure they are valid UTF8
>> strings. And honestly, it is only you who make sure that it is, because you
>> know the initial encoding.
> 
> The problem is as follows.... Even though I am a long time developer,
> I often have no clue what encoding a file is in when I look at the
> file using Nautilus file manager. I often open a file in my preferred
> text editor, look if it displays correctly, then look in the statusbar
> area for what encoding the editor detected (at least my editor does
> that nicely).
> 

The LCL does not have this feature. It can only handle UTF8. period.

> So even though you are using something as simple as the TMemo in LCL,
> and LCL always wants UTF-8, how do you know what encoding to convert
> from to UTF-8?

If you don't know, you cannot process it. Simple.

> If I give you various text files, each using one of the
> following schemes: UTF-16, UTF-16BE, and UTF-16LE, UTF-32 and whatever
> else I can find. Loading the file into a TStringList and then doing
> UTF8Decode on each line.... will it display correctly in the TMemo?
> 

For each of these encodings, you would first have to translate it to UTF8, before 
you give it to the LCL. Note that is not wise to load UTF16* and UTF32 encoded files 
into a byte indexed ansistring.

> Now what if the memo content is changed and then saved?  How does the
> TMemo know which encoding to use (I would preferably like the same
> encoding as before, not necessarily UTF-8). So if the file was
> originally UTF-32, I don't want it to be UTF-8 afterwards.

If you want it the be the same, then you have to convert it back. You know what it 
was in the first place, because you translated it to UTF8, before giving it to the LCL.

> If the TStringList.LoadFromFile(...) took a encoding parameter, it
> could store that encoding option internally, so if you call
> .SaveToFile(somefile.txt) later, it could use the same encoding as
> used in LoadFromFile(), otherwise default to something like utf-8 if
> no encoding was specified anywhere.

Maybe. I leave that suggestion to RTL developers. See also Marco's mail.

Vincent



More information about the fpc-devel mailing list