[fpc-pascal] Parse unicode scalar
Hairy Pixels
genericptr at gmail.com
Mon Jul 3 04:34:10 CEST 2023
> On Jul 3, 2023, at 12:20 AM, Nikolay Nikolov via fpc-pascal <fpc-pascal at lists.freepascal.org> wrote:
>
> There's no such thing as "unicode scalar" in Unicode terminology:
>
> https://unicode.org/glossary/
I got it from here https://docs.swift.org/swift-book/documentation/the-swift-programming-language/stringsandcharacters/
Ok today I I just tried to ask ChatGPT and got an answer. I must have asked the wrong thing yesterday but it got it right today (with one syntax error using an inline "var" in the code section for some reason).
How does this look?
procedure SplitUTF8Bytes(unicodeScalar: Integer; var bytes: array of Byte);
var
i: Integer;
byteCount: Integer;
begin
// Number of bytes required to represent the Unicode scalar
if unicodeScalar < $80 then
byteCount := 1
else if unicodeScalar < $800 then
byteCount := 2
else if unicodeScalar < $10000 then
byteCount := 3
else if unicodeScalar < $110000 then
byteCount := 4
else
raise Exception.Create('Invalid Unicode scalar');
// Extract the individual bytes using bitwise operations
for i := byteCount - 1 downto 0 do
begin
bytes[i] := $80 or (unicodeScalar and $3F);
unicodeScalar := unicodeScalar shr 6;
end;
// Set the leading bits of each byte
case byteCount of
2:
bytes[0] := $C0 or bytes[0];
3:
bytes[0] := $E0 or bytes[0];
4:
bytes[0] := $F0 or bytes[0];
end;
end;
Regards,
Ryan Joseph
More information about the fpc-pascal
mailing list