[fpc-pascal] Parse unicode scalar

Hairy Pixels genericptr at gmail.com
Tue Jul 4 03:03:07 CEST 2023



> On Jul 4, 2023, at 1:15 AM, Mattias Gaertner via fpc-pascal <fpc-pascal at lists.freepascal.org> wrote:
> 
> function ReadUTF8(p: PChar; ByteCount: PtrInt): PtrInt;
> // returns the number of codepoints
> var
>  CodePointLen: longint;
>  CodePoint: longword;
> begin
>  Result:=0;
>  while (ByteCount>0) do begin
>    inc(Result);
>    CodePoint:=UTF8CodepointToUnicode(p,CodePointLen);
>    ...do something with the CodePoint...
>    inc(p,CodePointLen);
>    dec(ByteCount,CodePointLen);
>  end;
> end;

Thanks, this looks right. I guess this is how we need to iterate over unicode now.

Btw, why isn't there a for-loop we can use over unicode strings? seems like that should be supported out of the box. I had this same problem in Swift also where it's extremely confusing to merely iterate over a string and look at each character. Replacing characters will be tricky also so we need some good library functions.

Swift is especially terrible because there's NO ANSII string so even a 1 byte sequence needs all these confusing as hell functions to do any work with strings at all. Terrible experience and slooooow.

Regards,
Ryan Joseph



More information about the fpc-pascal mailing list