Skip to content

better json error message #14672

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

juan-morales
Copy link
Contributor

I will propose this RFC today, but I did a PoC change just to have an idea about how to do it.

In short, the idea is to have better error messages, instead of having an error message saying "Syntax Error", to have something like "Syntax Error - error at character X near content " foo: bar }" .... 😄

This is still Work in progress.

Eventhough is not finished, and I have to write the email to the email list, I want to say ...... Thanks in advance @derickr because you gave me the idea, ... exactly on this moment of history:

What's New in PHP 8.3 - Derick Rethans

I also tag you @bukka so you know since now, that I am doing this, as all this part of php was mainly ... you 😄

still Work In Progress, and will be done soon. Hope this change makes sense, and also makes PHP greater.

@bukka
Copy link
Member

bukka commented Jun 28, 2024

I have been already looking into this in past and implemented some sort of solution in parser and scanner in jso ages ago. It has got simillar parser and scanner so the idea was to first to implement it there and then to PHP. I recently revisited it but the future plan is to actually replace the current scanner and parser with simd based parser so I postponed integration of that to prevent any potential break with in the lines and basically implementing the same thing twice. So the state is that I don't want to spend too much time on the current parser improvements as it might go in the future.

If you really wanted to progress on this I would recommend not to change the current messages which is not that helpful and might break some code potentially but maybe introduce something like json_last_error_info that would return array containing message, code and token start and end position instead (see JSO code how it's done in the parser and scanner for that).

@juan-morales
Copy link
Contributor Author

@bukka thanks for the feedback.

I will move on with the "json_last_error_info" option. Its ok if in the future this and something else gets replaced with a new parser technology, but because we dont know exactly when that is going to happen, lets make this real now and available for our users, including Dereck, which requested this publicly 😄 (thanks for everything Dereck)

So, I will re-do all this towards this new strategy, and when done , I will ping you @bukka .

Also please, if you remember, let me know when working with the new parser in the future, I am a newby, but once in a while I get some edge-corner ideas that might help you.

@juan-morales juan-morales requested a review from kocsismate as a code owner June 28, 2024 23:08
@juan-morales
Copy link
Contributor Author

here appears that I requested a code review form you @kocsismate but I did not do it. FYI 😄

@TimWolla
Copy link
Member

not to change the current messages which is not that helpful and might break some code

I don't think that error messages fall under backwards compatibility guarantees anywhere. They are meant for human consumption, not for the computer. I'm strongly in favor of improving existing error messages instead of adding new functionality to obtain the necessary information, because then everyone would automatically benefit from it.

@juan-morales
Copy link
Contributor Author

@TimWolla @bukka in this PR i have enriched the actual error message and also created a new function to retrieve more info.

This is work in progress.

I do not know how to proceed with this. RFC or not ? you people have more experience en this. please advice.

Thanks in advance.

Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't comment on the implementation much, but enhancing error messages should not require a new function. We change error messages all the time to make them better.

ext/json/json.c Outdated
Comment on lines 39 to 44
json_error_info_type json_error_info_data = {
NULL,
0,
0,
0
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could really use designated initializer, or at the bare minimum comments to document what each field is.

Also is there a reason this is a global?

@juan-morales juan-morales changed the title WIP - RFC - json error message enhanced json error message enhanced Jul 13, 2024
@juan-morales
Copy link
Contributor Author

@Girgias @TimWolla @bukka - change done. I only enhanced the current error message.

This is my bets try with the implementation, feedback is more than welcome because I am not a good programmer.

Thanks in advance and I hope we can get this merge soon.

@juan-morales juan-morales requested review from Girgias and mfn July 13, 2024 15:19
@juan-morales juan-morales changed the title json error message enhanced better json error message Jul 14, 2024
Comment on lines +56 to +61
snprintf(
token,
sizeof(token),
"%s",
parser->scanner.token
);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to print something in the function called to update data? This might very well be justified, I just don't understand it for now, and so I wanted to ask.

Copy link
Member

@TimWolla TimWolla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure about how I feel regarding including the JSON in the error message instead of just including the position. The JSON is user controlled and can (a) contain characters causing issues (such as the NUL byte, which is not correctly handled with this patch) and can (b) contain PII data which would end up in error logs.

Testing Invalid UTF-8
bool(false)
int(5)
string(56) "Malformed UTF-8 characters, possibly incorrectly encoded"
string(92) "Malformed UTF-8 characters, possibly incorrectly encoded, at character 0 near content: %s"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should check for the exact error message (i.e. the %s should not be there).

@juan-morales
Copy link
Contributor Author

I'm not quite sure about how I feel regarding including the JSON in the error message instead of just including the position. The JSON is user controlled and can (a) contain characters causing issues (such as the NUL byte, which is not correctly handled with this patch) and can (b) contain PII data which would end up in error logs.

Regarding this, I dont knwo how to do an experiment with the situations you describe, if you could help me out on that and we could "see" what happens, would be nice. Even though I dont see (maybe because of my lack of experience) the risk, I mean, I think will be handle ok by the code, but I dont know how to do this experiment.

@bukka
Copy link
Member

bukka commented Jul 22, 2024

I would prefer not to add content to the messages. Also the starting position might not be always the most useful place even though it mostly is. Previously I took token start and end position which might be sometimes more useful. It's kind of why I suggested separate function where you could put all that info including part of the impacted text. It might be worth to research json linters and other parsers to see what's being done there.

I should have more time next week so will try to take a proper look to the code then.

@juan-morales
Copy link
Contributor Author

Honestly I dont fully understand what is the problem with showing content from the json. If there is a possibility on an attack exploiting such a thing, we could think of a preventive measure, but I do think is useful to show the content in the error, definetly useful. Remember I am not adding full content, only a fixed part of it.

I know you people have an agenda, and are busy, ... me too honestly 😄 but can we work on this please ?

I know is not a HUGE contribution to the project, but is definetly useful, even @derickr said it in a conference.

I am here, ready to improve this, but I definetly need your guidance.

I laso read your comment @bukka and I do share your thoughts about improving the complete JSON parser, but not now, I would like to have an approach of enriching the exiting message and nothing else than that. For sure that re-thinking the JSON parser would be great, and would open new possibilities, but that is way bigger than just enriching an actual error message.

I look fordward for your replies.

Thanks in advance.

Copy link
Member

@bukka bukka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the scanner only approach is the right solution here as it cannot cover all issues. What I implemented in JSO (which is sort of what I planned for json ext before considering new parser) is to just track locations which you can see here: https://github.com/bukka/jso/blob/46a9b457596380bf087e0e8fadcf99a5e582930f/src/jso_scanner.re (specifically see the JSO_SCANNER_LOC usage). This is then used by parser in https://github.com/bukka/jso/blob/46a9b457596380bf087e0e8fadcf99a5e582930f/src/jso_parser.y . Bison has actually specifically for that %locations which passes YYLTYPE to called functions so the location can be passed around and it covers more situations. This is also line based which is useful for parsing formatted JSON.

In terms of the content in the error message I just find it a bit messy due to formatting and potential leakage of sensitive data from the decoded message. Note that this message can end up eventually in the logs (e.g. if JsonException message is logged for example).

So to sum it up, the implementation needs to completely change and we need to agree on the right message and whether it makes sense to introduce separate function providing more info. I'm fine if we intially change the error message but it needs to work correctly and tests will need to be also extended to cover more possible errors.

@@ -16,5 +16,5 @@ NULL
bool(true)
NULL
bool(true)
string(36) "The decoded property name is invalid"
string(71) "The decoded property name is invalid, at character 23 near content: 1}]"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly what I was talking about when I mentioned that the might not be exactly right. This is not where ther error is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are likely many more examples where this would fail.

Copy link
Contributor

@mfn mfn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I see review was requested from me.

I initially left a comment because the code didn't look safe to me.

Re-reading the topic and feedack here, I agree that I don't think it's a good idea to in fact expose content of the JSON this way (i.e. by default).

It's not (only) about exploiting to gain unauthorized access but the concern of data originating from outside ending up in places it wasn't before raises a concern (PII was given and I agree). In todays world LOTS of untrusted data flows in via JSON and errors are logged and I rather have the actual error message not contain this data and be responsible to retrieve it via other means.

So basically I disagree with the premise of the implementation exposing it on the default error message.

Whether it would/could need a dedicated method to retrieve the details: I've feelings regarding this, but it's an explicit approach so (to me) better then changing the default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants