[fpc-pascal] Parse unicode scalar

Mattias Gaertner nc-gaertnma at netcologne.de
Mon Jul 3 06:36:35 CEST 2023


On Mon, 3 Jul 2023 09:34:10 +0700
Hairy Pixels via fpc-pascal <fpc-pascal at lists.freepascal.org> wrote:

>[...]
> Ok today I I just tried to ask ChatGPT and got an answer. I must have
> asked the wrong thing yesterday but it got it right today (with one
> syntax error using an inline "var" in the code section  for some
> reason).
> 
> How does this look?
> 
> procedure SplitUTF8Bytes(unicodeScalar: Integer; var bytes: array of
> Byte);

Useless array of.
And it does not return the bytecount.

> var
>   i: Integer;
>   byteCount: Integer;
> begin
>   // Number of bytes required to represent the Unicode scalar
>   if unicodeScalar < $80 then
>     byteCount := 1
>   else if unicodeScalar < $800 then
>     byteCount := 2
>   else if unicodeScalar < $10000 then
>     byteCount := 3
>   else if unicodeScalar < $110000 then
>     byteCount := 4
>   else
>     raise Exception.Create('Invalid Unicode scalar');
> 
>   // Extract the individual bytes using bitwise operations
>   for i := byteCount - 1 downto 0 do
>   begin
>     bytes[i] := $80 or (unicodeScalar and $3F);

Wrong for byteCount=1

>     unicodeScalar := unicodeScalar shr 6;
>   end;
> 
>   // Set the leading bits of each byte
>   case byteCount of
>     2:
>       bytes[0] := $C0 or bytes[0];
>     3:
>       bytes[0] := $E0 or bytes[0];
>     4:
>       bytes[0] := $F0 or bytes[0];
>   end;
> end;

Well, it got the basic idea of UTF-8 multibytes right and it compiles,
so maybe half the points?

Mattias


More information about the fpc-pascal mailing list