[fpc-pascal] Parse unicode scalar
    Hairy Pixels 
    genericptr at gmail.com
       
    Mon Jul  3 04:34:10 CEST 2023
    
    
  
> On Jul 3, 2023, at 12:20 AM, Nikolay Nikolov via fpc-pascal <fpc-pascal at lists.freepascal.org> wrote:
> 
> There's no such thing as "unicode scalar" in Unicode terminology:
> 
> https://unicode.org/glossary/
I got it from here https://docs.swift.org/swift-book/documentation/the-swift-programming-language/stringsandcharacters/
Ok today I I just tried to ask ChatGPT and got an answer. I must have asked the wrong thing yesterday but it got it right today (with one syntax error using an inline "var" in the code section  for some reason).
How does this look?
procedure SplitUTF8Bytes(unicodeScalar: Integer; var bytes: array of Byte);
var
  i: Integer;
  byteCount: Integer;
begin
  // Number of bytes required to represent the Unicode scalar
  if unicodeScalar < $80 then
    byteCount := 1
  else if unicodeScalar < $800 then
    byteCount := 2
  else if unicodeScalar < $10000 then
    byteCount := 3
  else if unicodeScalar < $110000 then
    byteCount := 4
  else
    raise Exception.Create('Invalid Unicode scalar');
  // Extract the individual bytes using bitwise operations
  for i := byteCount - 1 downto 0 do
  begin
    bytes[i] := $80 or (unicodeScalar and $3F);
    unicodeScalar := unicodeScalar shr 6;
  end;
  // Set the leading bits of each byte
  case byteCount of
    2:
      bytes[0] := $C0 or bytes[0];
    3:
      bytes[0] := $E0 or bytes[0];
    4:
      bytes[0] := $F0 or bytes[0];
  end;
end;
Regards,
Ryan Joseph
    
    
More information about the fpc-pascal
mailing list