-
Notifications
You must be signed in to change notification settings - Fork 170
Add support for chunked encoding and larger bodies #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Other checks in the code rely on the possibility that there is no iContentLength set, but that condition is currently never given. Hence, in resetState(), do this assignment explicitly, so that in transfers without a Content-Length field, iContentLength remains -1.
Support for chunked transfer encoding is mandated by many web servers. For the protocol details, see https://en.wikipedia.org/wiki/Chunked_transfer_encoding and RFC7203, section 4.1: https://tools.ietf.org/html/rfc7230#section-4.1 Support for this transfer scheme is now built in to HttpClient::responseBody(), and the state machine handles Content-Length transmissions as well. That has the benefit that timing delays are handled fine for both cases. Note that this replaces #7. Fixes #3
There seems to be a limit in how many bytes Client::write() may take at once, and data is not sent if it is exceeded. I'm not entriely sure where that limit comes from, but I have the suspicion is has to do with the MTU. A value of 1024 works fine in my test. If that turns out to not cover all cases, the threshold can be made configurable later on.
Is this fork actively maintained? As most modern web server implementations send content in chunks, I would much like to see these patches merged. Which of the many forks is considered the canonical upstream? |
Hmm. I played around with that idea and I'm not sure whether I agree. The chunk reader must know about higher-lever body vs. header details etc, and calling up into those layers from |
Fair point. @cmaglie @facchinm @damellis @tigoe what to you think about this? The two approaches we are discussing for support chunk HTTP responses are as follows:
|
I'd prefer that chunks were handled internally, without the user needing to be involved. But if we wanted to compromise, maybe there could be an optional parameter for read() to get chunks? Or maybe a function for reading in chunks? |
I'm confused. That would mean even for non-chunked encodings, Most of my uncertainty here comes from an unfinished attempt to implement the logic in |
Conceptually what I'm imagining is that you can read() and see available() for the content without chunk marks, as I think that will be the most common use case. But I can see a need for certain advanced users to see the chunks. I'm agnostic as to how that's done. |
And for the headers, presumably. And FWIW, chunk decoding only applies to the body, not the headers.
IMO, providing both the high-level and the low-level interface through the same method, or an overloaded version thereof, is a hack. I really think users can't have it both ways: Either all the complexity is abstracted away through But that's just my opinion. We can as well agree to disagree on this :) |
No need to agree to disagree. I'm fine with putting chunk management in different functions than read() and available(). What would you propose for an API that allows the user to handle the response chunk by chunk? |
:)
If that should be needed, |
Sounds good; presumably then you'd need to call for the chunks repeatedly, right? Can you make up a snippet that shows how it might be used in a typical use case? I confess I've never had need to get the individual chunks in any HTTP app, so I'm having trouble imagining a good use case. Though I have a good example of node.js reading chunks which we could borrow ideas from if that helps. |
That's not what I needed it for, but you could take streaming as one example. I actually never thought about providing an interface for that use case, but yes, we definitely should. Maybe it even makes sense to add |
I'm just catching up on the activity :) For chunk by chunk reads, isn't that close to what the Stream API would do? Another idea is a |
But again, not all transfers are chunked, there is the good old 'Content-Size` based as well. Also, currently,
Same thing here. Yes, if you started fresh, that would be an option. I just don't see how
As things currently stand, transparent gzip support (which again only applies to the body, not the headers) needs to live in the higher-level APIs. But there, it is totally possible. |
Seems reasonable to me. Would you still use available() to know when the body's exhausted, or use another method? |
|
Just to throw in a random opinion, I'd like to see read() transparently What's the use case for wanting to read the raw stream (before the chunks On Tue, Oct 25, 2016 at 7:26 AM, Daniel Mack [email protected]
|
But there are existing users. I'm not the maintainer of this library, but if I were, I would strongly object against breaking backwards compatibility. Also, again,
You could use the library to just build the request, and then parse the response yourself. |
A similar statement could be said about the changes proposed in this PR. I agree with David's view "that the current behavior is a bug.". Having a different set of API's for non-chunked and chunked responses would push more responsibility to user sketches. The user that makes the HTTP request should not need to care about how the body is encoded (normal, chunks, gzipped) - in the end they want access to the unencoded body. In an ideal world, if the server changed the response type from non-chunked to chunked, the an existing sketch should continue to work transparently. If we do decide against changing |
Okay, that's true. Strictly speaking, that's also a break.
Then how would the API look like to read a specific header?
No, I don't think that's a good idea. Whether or not a web server uses chunks to transmit the body should really not matter to the user. A web server may even decide to switch the behavior from one request to the other, and that's perfectly fine according to the RFC. So - if you are willing to break the effect of existing library calls (I won't object), the question is really how users would access the raw bits and header information if it's no longer possible to do so through |
This is ok with me, we can update the library major version to indicate a breaking change as well as update the change log. FWIW, the library is also marked as "experimental" in the library.properties. I'm thinking everything will act the same way as it is now. Except if the library detects the response body is chunked and all the headers have been read in, it will enable some extra logic in The above idea does introduce extra state logic in the library, which increases complexity. However, it leads to a better Arduino UX. |
What would that extra logic look like? To the end user, or in the library itself? |
It would be in the library itself. For example, if the first chunk size is |
Sounds like a good approach to me. |
This adds support for chunked transfer encodings and larger body writes. It also makes
HttpClient::responseBody()
more robust, similar to what #7 does.Replaces / conflicts with #7
Fixes #3