[fpc-pascal] Problem with string conversion

Fri Oct 20 18:42:30 CEST 2006

On 10/20/06, Vincent Snijders <vsnijders at quicknet.nl> wrote:
> This should be
>    WideText := GetMem(Size*2);
> because you get the number of characters, and the number of bytes 2* number of
> characters.

Thanks, it works, but I still have doubts.

1) On linux I will need that cwstring unit, right? This was a utf-8
test to be used on fpGUI, and possibly LCL. So can´t we just add
cwstring on another unit instead of the first of the program?

2) Does this work in case my string contains characters bigger then
#FFFF ? I mean, it seams that we suppose that each character will have
2 bytes, but this may not be true.

3) Shouldn´t we allocate Size * 2 + 2? I mean, we did not allocate
space for the null-terminator.

4) Here is this function on action:

procedure TGDICanvas.DoTextOut(const APosition: TPoint; const AText: String);
var
  UnicodeEnabledOS: Boolean;
  WideText: PWideChar;
  AnsiText: string;
  Size: Integer;
begin
  UnicodeEnabledOS := True;

  NeedFont(True);

  if UnicodeEnabledOS then
  begin
    Size := Utf8ToUnicode(nil, PChar(AText), 0);
    WideText := GetMem(Size * 2);
    Utf8ToUnicode(WideText, PChar(AText), Size);
    dynWindows.TextOutW(Handle, APosition.x, APosition.y, WideText, Size - 1);
    FreeMem(WideText);
  end
  else
  begin
    AnsiText := Utf8ToAnsi(AText);
    Windows.TextOut(Handle, APosition.x, APosition.y, PChar(AnsiText),
Length(AnsiText));
  end;
end;

Notice the Size - 1 on the TextOutW call. If I use Size instead of
Size -1 it will display a wrong character as the last of my string.
Even if I clean the string filling it with zeroes before I pass it to
the conversion unit. Why is that? Size already counts the
null-terminator?

Umm, I think that Utf8ToUnicode needs to be documented. Even some
comments on the code would help, currently there are no comments at
all.

5) When I try to convert strings that contain Line-Endings it doesn´t
seam to work. What happens to Line-Ending marks on UTF-16? I mean, if
we are on linux, a utf-16 line ending marker cannot have just 1 byte,
can it?

If I convert a string with a line ending and pass that to ExtTextOutW,
a wrong character will appear on the place of the line-ending marker.

thanks,
-- 
Felipe Monteiro de Carvalho