[fpc-pascal] Parse unicode scalar

Hairy Pixels genericptr at gmail.com
Mon Jul 3 04:34:10 CEST 2023



> On Jul 3, 2023, at 12:20 AM, Nikolay Nikolov via fpc-pascal <fpc-pascal at lists.freepascal.org> wrote:
> 
> There's no such thing as "unicode scalar" in Unicode terminology:
> 
> https://unicode.org/glossary/

I got it from here https://docs.swift.org/swift-book/documentation/the-swift-programming-language/stringsandcharacters/

Ok today I I just tried to ask ChatGPT and got an answer. I must have asked the wrong thing yesterday but it got it right today (with one syntax error using an inline "var" in the code section  for some reason).

How does this look?

procedure SplitUTF8Bytes(unicodeScalar: Integer; var bytes: array of Byte);
var
  i: Integer;
  byteCount: Integer;
begin
  // Number of bytes required to represent the Unicode scalar
  if unicodeScalar < $80 then
    byteCount := 1
  else if unicodeScalar < $800 then
    byteCount := 2
  else if unicodeScalar < $10000 then
    byteCount := 3
  else if unicodeScalar < $110000 then
    byteCount := 4
  else
    raise Exception.Create('Invalid Unicode scalar');

  // Extract the individual bytes using bitwise operations
  for i := byteCount - 1 downto 0 do
  begin
    bytes[i] := $80 or (unicodeScalar and $3F);
    unicodeScalar := unicodeScalar shr 6;
  end;

  // Set the leading bits of each byte
  case byteCount of
    2:
      bytes[0] := $C0 or bytes[0];
    3:
      bytes[0] := $E0 or bytes[0];
    4:
      bytes[0] := $F0 or bytes[0];
  end;
end;

Regards,
Ryan Joseph



More information about the fpc-pascal mailing list