<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">On 07/30/2013 04:29 AM, Noah Silva
wrote:<br>
</div>
<br>
<blockquote
cite="mid:CANC_V0Mac5tdBrAnqBjaJoKSwY=1Ts2E8r62r=kKtqiAWrK-jw@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div>No, UTF16 only needs more memory if most of the text is
ASCII. It actually uses less than UTF8 in the average
case for Japanese, for example. <br>
</div>
</div>
</div>
</div>
</blockquote>
Of course you are right here. <br>
<blockquote
cite="mid:CANC_V0Mac5tdBrAnqBjaJoKSwY=1Ts2E8r62r=kKtqiAWrK-jw@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"> Linux OS API in
most cases is 8 Bit,<br>
</div>
</blockquote>
<div><br>
</div>
<div>I assume by 8bit, you mean variable byte encoding like
UTF8.</div>
</div>
</div>
</div>
</blockquote>
Yep.<br>
<blockquote
cite="mid:CANC_V0Mac5tdBrAnqBjaJoKSwY=1Ts2E8r62r=kKtqiAWrK-jw@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"> Conversions are
very expensive. <br>
</div>
</blockquote>
<div><br>
</div>
<div>This is not as bad as some people make it out to be.
You have to be converting a *lot* of data for it to be
noticeable.</div>
</div>
</div>
</div>
</blockquote>
That is why I pointed out that the way to select an encoding depends
on how much "calculations" are done on the strings. <br>
<br>
But in fact I tend to agree, while the argument why - when
converting to Unicode - the Lazarus team chose to do the LCL API in
UTF-8 (while MSE chose UTF-16 for the same purpose) was exactly this
(I never felt comfortable with that, BTW). <br>
<br>
<blockquote
cite="mid:CANC_V0Mac5tdBrAnqBjaJoKSwY=1Ts2E8r62r=kKtqiAWrK-jw@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote"><br>
<div bgcolor="#FFFFFF" text="#000000">> I suppose this is
bound to change once fpc has completed the move to "new
Delphi Strings". <br>
</div>
<div><br>
</div>
<div>I really don't think so, the reasons are even well
detailed in the Wiki. <br>
</div>
</div>
</div>
</div>
</blockquote>
I always was told that Delphi compatibility is the primary driving
forth for any modifications. This necessarily suggests this move
(which is not possible before fpc does provides "new Delphi
Strings"). But there might be multiple opinions. <br>
<br>
In fact my primary intentions with Lazarus / fpc are not to do my
own generic projects, but to help my colleagues to move their huge
Delphi XE program system to Linux. This in fact needs complete
support for "new Delphi Strings". <br>
<br>
<blockquote
cite="mid:CANC_V0Mac5tdBrAnqBjaJoKSwY=1Ts2E8r62r=kKtqiAWrK-jw@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div>From what I understand, the plan is for strings to
store their codepage as an attribute internally along with
their length, and since the compiler/runtime library will
know their codepage, it can convert as necessary. </div>
</div>
</div>
</div>
</blockquote>
That already is ready to use in the svn and is exactly the said "new
Delphi Strings", and - when activated - completely compatible with
Delphi XE. It's rather nice and fast, but Delphi lacks a
_completely_dynamic_encoding_ type with auto-conversion only when
necessary. (IMHO rather easy doable by compiler magic, but
"forgotten" in Delphi XE) <br>
<blockquote
cite="mid:CANC_V0Mac5tdBrAnqBjaJoKSwY=1Ts2E8r62r=kKtqiAWrK-jw@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> Either way, you can make your own StringList variants
for each type easily enough. <br>
</div>
</div>
</div>
</div>
</blockquote>
Not without compiler support (if you want auto-conversion when
necessary). <br>
<blockquote
cite="mid:CANC_V0Mac5tdBrAnqBjaJoKSwY=1Ts2E8r62r=kKtqiAWrK-jw@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote"><br>
<div>In fact, I am fine with manual conversions, so long as
99% of everything "just works" with UTF8 and/or UTF16. <br>
</div>
</div>
</div>
</div>
</blockquote>
I'm not fine with TStringList and friends forcing any predefined
encoding. This in fact does work rather nicely without the
application programmer even noticing it. But IMHO a cross platform
system like fpc can be expected to do better, doing away with
windowish remains from Delphi whenever possible. <br>
<br>
-Michael<br>
</body>
</html>