[fpc-pascal] Parse unicode scalar
Hairy Pixels
genericptr at gmail.com
Tue Jul 4 03:03:07 CEST 2023
> On Jul 4, 2023, at 1:15 AM, Mattias Gaertner via fpc-pascal <fpc-pascal at lists.freepascal.org> wrote:
>
> function ReadUTF8(p: PChar; ByteCount: PtrInt): PtrInt;
> // returns the number of codepoints
> var
> CodePointLen: longint;
> CodePoint: longword;
> begin
> Result:=0;
> while (ByteCount>0) do begin
> inc(Result);
> CodePoint:=UTF8CodepointToUnicode(p,CodePointLen);
> ...do something with the CodePoint...
> inc(p,CodePointLen);
> dec(ByteCount,CodePointLen);
> end;
> end;
Thanks, this looks right. I guess this is how we need to iterate over unicode now.
Btw, why isn't there a for-loop we can use over unicode strings? seems like that should be supported out of the box. I had this same problem in Swift also where it's extremely confusing to merely iterate over a string and look at each character. Replacing characters will be tricky also so we need some good library functions.
Swift is especially terrible because there's NO ANSII string so even a 1 byte sequence needs all these confusing as hell functions to do any work with strings at all. Terrible experience and slooooow.
Regards,
Ryan Joseph
More information about the fpc-pascal
mailing list