[fpc-pascal] Parse unicode scalar
Mattias Gaertner
nc-gaertnma at netcologne.de
Mon Jul 3 06:36:35 CEST 2023
On Mon, 3 Jul 2023 09:34:10 +0700
Hairy Pixels via fpc-pascal <fpc-pascal at lists.freepascal.org> wrote:
>[...]
> Ok today I I just tried to ask ChatGPT and got an answer. I must have
> asked the wrong thing yesterday but it got it right today (with one
> syntax error using an inline "var" in the code section for some
> reason).
>
> How does this look?
>
> procedure SplitUTF8Bytes(unicodeScalar: Integer; var bytes: array of
> Byte);
Useless array of.
And it does not return the bytecount.
> var
> i: Integer;
> byteCount: Integer;
> begin
> // Number of bytes required to represent the Unicode scalar
> if unicodeScalar < $80 then
> byteCount := 1
> else if unicodeScalar < $800 then
> byteCount := 2
> else if unicodeScalar < $10000 then
> byteCount := 3
> else if unicodeScalar < $110000 then
> byteCount := 4
> else
> raise Exception.Create('Invalid Unicode scalar');
>
> // Extract the individual bytes using bitwise operations
> for i := byteCount - 1 downto 0 do
> begin
> bytes[i] := $80 or (unicodeScalar and $3F);
Wrong for byteCount=1
> unicodeScalar := unicodeScalar shr 6;
> end;
>
> // Set the leading bits of each byte
> case byteCount of
> 2:
> bytes[0] := $C0 or bytes[0];
> 3:
> bytes[0] := $E0 or bytes[0];
> 4:
> bytes[0] := $F0 or bytes[0];
> end;
> end;
Well, it got the basic idea of UTF-8 multibytes right and it compiles,
so maybe half the points?
Mattias
More information about the fpc-pascal
mailing list