<div dir="ltr">Hi,<div class="gmail_extra"><br><div class="gmail_quote">2013/7/29 Michael Schnell <span dir="ltr"><<a href="mailto:mschnell@lumino.de" target="_blank">mschnell@lumino.de</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"><div class="im">
<div>On 07/29/2013 07:36 AM, Noah Silva
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span><font color="#888888">
<br>
</font></span></blockquote>
<div>Using UTF16 for internal string handling is a sensible
option. <br>
</div>
</div>
</div>
</div>
</blockquote></div>
It depends.<br>
UTF-16 needs more memory used<br></div></blockquote><div><br></div><div>No, UTF16 only needs more memory if most of the text is ASCII. It actually uses less than UTF8 in the average case for Japanese, for example. </div>
<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">
Linux OS API in most cases is 8 Bit, while Windows OS API is 16 bit<br></div></blockquote><div><br></div><div>I assume by 8bit, you mean variable byte encoding like UTF8.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
Conversions are very expensive. <br></div></blockquote><div><br></div><div>This is not as bad as some people make it out to be. You have to be converting a *lot* of data for it to be noticeable.</div><div><br></div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">
If you need to import export much data but don't do much calculating
of course using the the import/export format all over the place is
sensible. <br>
If you do many calculations, the type of calculation might suggest a
certain encoding.<div class="im"><br>
</div></div></blockquote><div>And if you don't do either (which most programs don't with string data), then either format is just fine. </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"><div class="im"><blockquote type="cite"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>To address your specific points:</div>
<div>1.Lazarus User API already supports UTF8 so far as I
know.</div>
</div>
</div>
</div>
</blockquote></div>
I suppose this is bound to change once fpc has completed the move to
"new Delphi Strings". <br></div></blockquote><div><br></div><div>I really don't think so, the reasons are even well detailed in the Wiki. </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"><div class="im">
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div>2. TStringList could easily support both, but as long
as the conversion to/from other code pages (especially
UTF8) is automatic, I wouldn't mind.</div>
</div>
</div>
</div>
</blockquote></div>
I already delved into this in another thread here: I do believe that
it is easily possible to invent a string type that supports any
encoding and that can be used to create such a flexible TStringList,
but this needs additional compiler support in an way that is not
anticipated by Delphi. IMHO this is possible without risking
noticeable performance degradation in any of the thinkable
application variants. <br><div class="im">
<br></div></div></blockquote><div>From what I understand, the plan is for strings to store their codepage as an attribute internally along with their length, and since the compiler/runtime library will know their codepage, it can convert as necessary. Either way, you can make your own StringList variants for each type easily enough. </div>
<div><br></div><div>For example, I normally use UTF8 for everything, but I have one linguistic analysis program I wrote that does heavy duty analysis of the string, so that stores everything in memory as UTF16. I use StringList and similar without any problems. (I don't use UTF8 and UTF16 in the same structures though...)</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"><div class="im">
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div>3. Not sure what class inheritance has to do with
this...</div>
</div>
</div>
</div>
</blockquote></div>
If you do TSrtingList (in fact TStrings)<font color="#888888"> </font>that
uses this new type in the user-programmer interface it needs to be
possible to derive classes from those that use the fully Delphi
compatible String types with predefined encoding. The compiler
magic needs to be done appropriately to handle this cases,
requesting automatic conversions (only) when necessary. <br><span class="HOEnZb"><font color="#888888">
<br></font></span></div></blockquote><div>In fact, I am fine with manual conversions, so long as 99% of everything "just works" with UTF8 and/or UTF16. Then you would only need to occasionally worry about local encoding for legacy import/export use. It's easy enough just to make an overload for Windows API calls (ick) that can accept UTF8 or vice versa for UTF8 native calls you want to use with UTF16 if you really need to. The real issue to be is that until now, the compiler doesn't actually *know* that a string is UTF8 or SJIS, which means it doesn't give you an error when things aren't right, they just get garbled, and the programmer gets left with a mystery to sort out.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"><span class="HOEnZb"><font color="#888888">
-Michael<br>
</font></span></div>
<br></blockquote><div>-- Noah </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">_______________________________________________<br>
fpc-pascal maillist - <a href="mailto:fpc-pascal@lists.freepascal.org">fpc-pascal@lists.freepascal.org</a><br>
<a href="http://lists.freepascal.org/mailman/listinfo/fpc-pascal" target="_blank">http://lists.freepascal.org/mailman/listinfo/fpc-pascal</a><br></blockquote></div><br></div></div>