[fpc-pascal] json parsing: detecting invalid escape sequences

Benito van der Zander benito at benibela.de
Tue Sep 29 19:27:32 CEST 2020


Hi,

I am supposed to find invalid escape sequences when parsing JSON and 
replace them with a user defined fallback. Invalid in the sense that the 
unicode codepoint is not defined or a missing surrogate, not 
syntactically invalid.

For example, any occurrence of \uFFFF and \uDEAD should be replaced by 
\uffff and \udead respectively. Or alternatively with ???? depending on 
the settings.

I think I need to change the JSON scanner to be able to do that.

I could add a callback function OnInvalidEscape: function (escapeStart: 
pchar): string; of object;
Or perhaps OnInvalidEscape: function (unicodePoint, 
previousUnicodePointSurrogate: integer): string; of object; {although 
that would be troublesome if \uDEAD and \udead are supposed to be 
replaced with a different fallback}
Or OnInvalidEscape: function (const escapedString: string[4]): string; 
of object;

The function would return the unescaped value. Alternatively, the 
current string could be passed to it as var parameter, and the function 
would append its unescaped value directly.

Or move all unescaping to a callback function, could be called 
OnUnescape or OnDecodeEscape. So the scanner does not need to decide 
which escapes are invalid. Then

                       if (joUTF8 in Options) or 
(DefaultSystemCodePage=CP_UTF8) then
S:=Utf8Encode(WideString(WideChar(u1)+WideChar(u2))) // ToDo: use faster 
function
                       else
                         S:=String(WideChar(u1)+WideChar(u2)); // 
WideChar converts the encoding. Should it warn on loss?

could be replaced by one function call. And if the user does not set a 
callback function, the scanner would set its own callback function 
depending on the option.

Any interest in a patch that adds such a callback function? Or is there 
another way to do this?

Best,
Benito
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-pascal/attachments/20200929/c770c9ab/attachment.htm>


More information about the fpc-pascal mailing list