[fpc-pascal] Parse unicode scalar

Tomas Hajny xhajt03 at hajny.biz
Mon Jul 3 09:22:40 CEST 2023


On 3 July 2023 9:12:03 +0200, Hairy Pixels via fpc-pascal <fpc-pascal at lists.freepascal.org> wrote:
>> On Jul 3, 2023, at 2:04 PM, Tomas Hajny via fpc-pascal <fpc-pascal at lists.freepascal.org> wrote:
>> 
>> No - in this case, the "header" is the highest bit of that byte being 0.
>
>Oh it's the header BIT. Admittedly I don't understand how this function returns the highest bit using that case, which I think he was suggesting.
>
>function UTF8CodepointSizeFast(p: PChar): integer;
>begin
> case p^ of
>   #0..#191   : Result := 1;
>   #192..#223 : Result := 2;
>   #224..#239 : Result := 3;
>   #240..#247 : Result := 4;
>   else Result := 1; // An optimization + prevents compiler warning about uninitialized Result.
> end;
>end;

That's why I wrote "in this case". The "header" itself is not fixed size either, but the algorithm above shows how you can derive the length from the first byte.

Tomas



More information about the fpc-pascal mailing list