Add support for chunked encoding and larger bodies #9

zonque · 2016-08-22T09:22:13Z

This adds support for chunked transfer encodings and larger body writes. It also makes HttpClient::responseBody() more robust, similar to what #7 does.

Replaces / conflicts with #7
Fixes #3

Other checks in the code rely on the possibility that there is no iContentLength set, but that condition is currently never given. Hence, in resetState(), do this assignment explicitly, so that in transfers without a Content-Length field, iContentLength remains -1.

Support for chunked transfer encoding is mandated by many web servers. For the protocol details, see https://en.wikipedia.org/wiki/Chunked_transfer_encoding and RFC7203, section 4.1: https://tools.ietf.org/html/rfc7230#section-4.1 Support for this transfer scheme is now built in to HttpClient::responseBody(), and the state machine handles Content-Length transmissions as well. That has the benefit that timing delays are handled fine for both cases. Note that this replaces #7. Fixes #3

There seems to be a limit in how many bytes Client::write() may take at once, and data is not sent if it is exceeded. I'm not entriely sure where that limit comes from, but I have the suspicion is has to do with the MTU. A value of 1024 works fine in my test. If that turns out to not cover all cases, the threshold can be made configurable later on.

zonque · 2016-10-02T17:55:34Z

Is this fork actively maintained? As most modern web server implementations send content in chunks, I would much like to see these patches merged. Which of the many forks is considered the canonical upstream?

sandeepmistry · 2016-10-14T19:32:10Z

Hi @zonque,

Thanks for submitting this pull request!

I think it would be better to integrate the chunked encoding support in HttpClient::read() and HttpClient::available(). HttpClient::responseBody() is a higher level connivence API. Then the behaviour in #7 can be left alone.

What do you think?

zonque · 2016-10-22T12:34:53Z

I think it would be better to integrate the chunked encoding support in HttpClient::read() and HttpClient::available(). HttpClient::responseBody() is a higher level connivence API. Then the behaviour in #7 can be left alone.

Hmm. I played around with that idea and I'm not sure whether I agree. The chunk reader must know about higher-lever body vs. header details etc, and calling up into those layers from read() and available() seems like a violation. Also, from a user's perspective, I'd rather expect HttpClient::read() to return the actual bytes on the wire, including the chunk marks, and only the higher-level API calls to abstract them away and return the pure payload.

sandeepmistry · 2016-10-24T14:57:32Z

Fair point.

@cmaglie @facchinm @damellis @tigoe what to you think about this?

The two approaches we are discussing for support chunk HTTP responses are as follows:

The library handles the chunked response internally in HttpClient::read() and HttpClient::available(). So, the users of the library don't have to deal with any chunk handling in their code.
HttpClient::read() and HttpClient::available() return whatever sent on the wire, including the chunk marks. HttpClient::responseBody() is the only API that handles chunked responses.

tigoe · 2016-10-24T15:18:54Z

I'd prefer that chunks were handled internally, without the user needing to be involved. But if we wanted to compromise, maybe there could be an optional parameter for read() to get chunks? Or maybe a function for reading in chunks?

zonque · 2016-10-24T16:32:31Z

I'm confused. That would mean even for non-chunked encodings, ::read() needs to understand whether the client is currently reading the header or the body. How would a user use ::read() to access the headers then?

Most of my uncertainty here comes from an unfinished attempt to implement the logic in ::read() and ::available(), and it feels really wrong, at least when it's based on the current implementation. ::skipResponseHeaders() and friends also call into ::read(), while ::read() itself must know whether the headers have already been processed etc.

tigoe · 2016-10-24T16:44:07Z

Conceptually what I'm imagining is that you can read() and see available() for the content without chunk marks, as I think that will be the most common use case. But I can see a need for certain advanced users to see the chunks. I'm agnostic as to how that's done.

zonque · 2016-10-24T16:52:15Z

Conceptually what I'm imagining is that you can read() and see available() for the content without chunk marks, as I think that will be the most common use case. But I can see a need for certain advanced users to see the chunks.

And for the headers, presumably. And FWIW, chunk decoding only applies to the body, not the headers.

I'm agnostic as to how that's done.

IMO, providing both the high-level and the low-level interface through the same method, or an overloaded version thereof, is a hack. I really think users can't have it both ways: Either all the complexity is abstracted away through ::read(), then all they'll get is the processed actual HTTP body data. Or ::read() returns the raw data, and higher-level functions such as ::responseBody() take care of the rest. The latter is in-line with how the API looks right now.

But that's just my opinion. We can as well agree to disagree on this :)

tigoe · 2016-10-25T14:07:47Z

No need to agree to disagree. I'm fine with putting chunk management in different functions than read() and available(). What would you propose for an API that allows the user to handle the response chunk by chunk?

zonque · 2016-10-25T14:10:59Z

No need to agree to disagree.

:)

I'm fine with putting chunk management in different functions than read() and available(). What would you propose for an API that allows the user to handle the response chunk by chunk?

If that should be needed, HttpClient::responseBody() could gain a boolean parameter with a default value. Ie, HttpClient::responseBody(bool oneChunkOnly = false) or s.th.?

tigoe · 2016-10-25T14:14:28Z

Sounds good; presumably then you'd need to call for the chunks repeatedly, right? Can you make up a snippet that shows how it might be used in a typical use case? I confess I've never had need to get the individual chunks in any HTTP app, so I'm having trouble imagining a good use case. Though I have a good example of node.js reading chunks which we could borrow ideas from if that helps.

zonque · 2016-10-25T14:19:39Z

Sounds good; presumably then you'd need to call for the chunks repeatedly, right? Can you make up a snippet that shows how it might be used in a typical use case? I confess I've never had need to get the individual chunks in any HTTP app, so I'm having trouble imagining a good use case. Though I have a good example of node.js reading chunks which we could borrow ideas from if that helps.

That's not what I needed it for, but you could take streaming as one example. I actually never thought about providing an interface for that use case, but yes, we definitely should.

Maybe it even makes sense to add HttpClient::readChunk(), and then make ::responseBody() call into that repeatedly until the body is exhausted. I'll resurrect the hardware the next days and play around with the options.

sandeepmistry · 2016-10-25T14:21:15Z

I'm just catching up on the activity :)

For chunk by chunk reads, isn't that close to what the Stream API would do? available() would return chunk size, read() reads into it?

Another idea is a readRaw() API, for those wanting to read the raw network bytes. I'm not sure how useful this will be. This concept could be used for GZIP response if we ever support them.

zonque · 2016-10-25T14:26:40Z

For chunk by chunk reads, isn't that close to what the Stream API would do? available() would return chunk size, read() reads into it?

But again, not all transfers are chunked, there is the good old 'Content-Size` based as well.

Also, currently, ::read() would return the raw bytes, and this library has active users. Changing the behavior under the hood will likely break existing use-cases.

Another idea is a readRaw() API, for those wanting to read the raw network bytes. I'm not sure how useful this will be.

Same thing here. Yes, if you started fresh, that would be an option. I just don't see how ::read() would possibly remain backwards-compatible and support chunked reads, without turning the whole code-base into a stateful mess.

This concept could be used for GZIP response if we ever support them.

As things currently stand, transparent gzip support (which again only applies to the body, not the headers) needs to live in the higher-level APIs. But there, it is totally possible.

tigoe · 2016-10-25T14:39:20Z

Maybe it even makes sense to add HttpClient::readChunk(), and then make ::responseBody() call into that repeatedly until the body is exhausted. I'll resurrect the hardware the next days and play around with the options.

Seems reasonable to me. Would you still use available() to know when the body's exhausted, or use another method?

zonque · 2016-10-25T14:41:40Z

Seems reasonable to me. Would you still use available() to know when the body's exhausted, or use another method?

::available() would tell you if it's worth calling into ::readChunk() or ::read() again. ::readChunk() would return -1 once the game is over, I presume. But let's see.

damellis · 2016-10-25T18:15:12Z

Just to throw in a random opinion, I'd like to see read() transparently
merge the chunks and return the unified content. Yes, this is different
from the current behavior, but I'd argue that the current behavior is a
bug.

What's the use case for wanting to read the raw stream (before the chunks
are recombined)?

On Tue, Oct 25, 2016 at 7:26 AM, Daniel Mack [email protected]
wrote:

For chunk by chunk reads, isn't that close to what the Stream API would
do? available() would return chunk size, read() reads into it?

But again, not all transfers are chunked, there is the good old
'Content-Size` based as well.

Also, currently, ::read() would return the raw bytes, and this library
has active users. Changing the behavior under the hood will likely break
existing use-cases.

Another idea is a readRaw() API, for those wanting to read the raw network
bytes. I'm not sure how useful this will be.

Same thing here. Yes, if you started fresh, that would be an option. I
just don't see how ::read() would possibly remain backwards-compatible
and support chunked reads, without turning the whole code-base into a
stateful mess.

This concept could be used for GZIP response if we ever support them.

As things currently stand, transparent gzip support (which again only
applies to the body, not the headers) needs to live in the higher-level
APIs. But there, it is totally possible.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#9 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAYuQmbMnxH_P0ao3UtX1xXui8LEsHUVks5q3hGhgaJpZM4JpqgU
.

zonque · 2016-10-25T18:20:13Z

Just to throw in a random opinion, I'd like to see read() transparently merge the chunks and return the unified content. Yes, this is different from the current behavior, but I'd argue that the current behavior is a bug.

But there are existing users. I'm not the maintainer of this library, but if I were, I would strongly object against breaking backwards compatibility.

Also, again, ::read() is used to read the headers as well, which are never encoded in chunks and never gzip'ed. Such techniques only apply to the body, and hence I think it is completely okay to have function that is specific for reading the body.

What's the use case for wanting to read the raw stream (before the chunks are recombined)?

You could use the library to just build the request, and then parse the response yourself.

sandeepmistry · 2016-10-27T17:33:21Z

But there are existing users. I'm not the maintainer of this library, but if I were, I would strongly object against breaking backwards compatibility.

A similar statement could be said about the changes proposed in this PR. HttpClient::responseBody() currently returns a String response that includes the chunk markers.

I agree with David's view "that the current behavior is a bug.".

Having a different set of API's for non-chunked and chunked responses would push more responsibility to user sketches. The user that makes the HTTP request should not need to care about how the body is encoded (normal, chunks, gzipped) - in the end they want access to the unencoded body. In an ideal world, if the server changed the response type from non-chunked to chunked, the an existing sketch should continue to work transparently.

If we do decide against changing read() and available() to transparently handle chunked responses, another option would be to create a ChunkedHttpClient class that specifically deals with only chunked responses.

zonque · 2016-10-27T20:03:58Z

A similar statement could be said about the changes proposed in this PR. HttpClient::responseBody() currently returns a String response that includes the chunk markers.

Okay, that's true. Strictly speaking, that's also a break.

I agree with David's view "that the current behavior is a bug.".

Having a different set of API's for non-chunked and chunked responses would push more responsibility to user sketches. The user that makes the HTTP request should not need to care about how the body is encoded (normal, chunks, gzipped) - in the end they want access to the unencoded body. In an ideal world, if the server changed the response type from non-chunked to chunked, the an existing sketch should continue to work transparently.

Then how would the API look like to read a specific header?

If we do decide against changing read() and available() to transparently handle chunked responses, another option would be to create a ChunkedHttpClient class that specifically deals with only chunked responses.

No, I don't think that's a good idea. Whether or not a web server uses chunks to transmit the body should really not matter to the user. A web server may even decide to switch the behavior from one request to the other, and that's perfectly fine according to the RFC.

So - if you are willing to break the effect of existing library calls (I won't object), the question is really how users would access the raw bits and header information if it's no longer possible to do so through ::read().

sandeepmistry · 2016-10-27T20:16:29Z

So - if you are willing to break the effect of existing library calls (I won't object), the question is really how users would access the raw bits and header information if it's no longer possible to do so through ::read().

This is ok with me, we can update the library major version to indicate a breaking change as well as update the change log. FWIW, the library is also marked as "experimental" in the library.properties.

I'm thinking everything will act the same way as it is now. Except if the library detects the response body is chunked and all the headers have been read in, it will enable some extra logic in available() and read() to transparently de-chunk the request body. Let me know if this is clear, otherwise I would be happy to provide more details for different example scenarios.

The above idea does introduce extra state logic in the library, which increases complexity. However, it leads to a better Arduino UX.

tigoe · 2016-10-30T19:25:14Z

I'm thinking everything will act the same way as it is now. Except if the library detects the response body is chunked and all the headers have been read in, it will enable some extra logic in available() and read() to transparently de-chunk the request body. Let me know if this is clear, otherwise I would be happy to provide more details for different example scenarios.

What would that extra logic look like? To the end user, or in the library itself?

sandeepmistry · 2016-10-31T12:54:15Z

What would that extra logic look like? To the end user, or in the library itself?

It would be in the library itself.

For example, if the first chunk size is 4 and all the headers have been processed available() will return 4 (or less if wifiClient.available() has less bytes buffered). Once all the bytes in the current chunk have been read in, available() will try to read in the next chunk size including the \r\n after the chunk size sent on the wire.

tigoe · 2016-11-02T18:00:57Z

Sounds like a good approach to me.

sandeepmistry · 2016-11-03T13:00:50Z

Great!

@zonque are you ok with the proposal above? If so, can we close this PR and merge #7 for now.

We have a few other projects on the go at the moment, so won't be able to work on this in short term. However, any pull requests for the changes above are welcome.

sandeepmistry · 2017-01-12T14:27:01Z

I'm closing this for now, as @agdl has merged #7.

zonque added 4 commits August 22, 2016 11:03

cosemetic trailing white space cleanups

e4bc11f

zonque changed the title ~~Add support for chunked encoding and larger bodies and~~ Add support for chunked encoding and larger bodies Aug 22, 2016

sandeepmistry closed this Jan 12, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for chunked encoding and larger bodies #9

Add support for chunked encoding and larger bodies #9

zonque commented Aug 22, 2016

zonque commented Oct 2, 2016

sandeepmistry commented Oct 14, 2016

zonque commented Oct 22, 2016 •

edited

Loading

sandeepmistry commented Oct 24, 2016

tigoe commented Oct 24, 2016

zonque commented Oct 24, 2016

tigoe commented Oct 24, 2016

zonque commented Oct 24, 2016 •

edited

Loading

tigoe commented Oct 25, 2016

zonque commented Oct 25, 2016

tigoe commented Oct 25, 2016

zonque commented Oct 25, 2016 •

edited

Loading

sandeepmistry commented Oct 25, 2016

zonque commented Oct 25, 2016

tigoe commented Oct 25, 2016

zonque commented Oct 25, 2016 •

edited

Loading

damellis commented Oct 25, 2016

zonque commented Oct 25, 2016 •

edited

Loading

sandeepmistry commented Oct 27, 2016

zonque commented Oct 27, 2016

sandeepmistry commented Oct 27, 2016

tigoe commented Oct 30, 2016

sandeepmistry commented Oct 31, 2016

tigoe commented Nov 2, 2016

sandeepmistry commented Nov 3, 2016

sandeepmistry commented Jan 12, 2017

Add support for chunked encoding and larger bodies #9

Add support for chunked encoding and larger bodies #9

Conversation

zonque commented Aug 22, 2016

zonque commented Oct 2, 2016

sandeepmistry commented Oct 14, 2016

zonque commented Oct 22, 2016 • edited Loading

sandeepmistry commented Oct 24, 2016

tigoe commented Oct 24, 2016

zonque commented Oct 24, 2016

tigoe commented Oct 24, 2016

zonque commented Oct 24, 2016 • edited Loading

tigoe commented Oct 25, 2016

zonque commented Oct 25, 2016

tigoe commented Oct 25, 2016

zonque commented Oct 25, 2016 • edited Loading

sandeepmistry commented Oct 25, 2016

zonque commented Oct 25, 2016

tigoe commented Oct 25, 2016

zonque commented Oct 25, 2016 • edited Loading

damellis commented Oct 25, 2016

zonque commented Oct 25, 2016 • edited Loading

sandeepmistry commented Oct 27, 2016

zonque commented Oct 27, 2016

sandeepmistry commented Oct 27, 2016

tigoe commented Oct 30, 2016

sandeepmistry commented Oct 31, 2016

tigoe commented Nov 2, 2016

sandeepmistry commented Nov 3, 2016

sandeepmistry commented Jan 12, 2017

zonque commented Oct 22, 2016 •

edited

Loading

zonque commented Oct 24, 2016 •

edited

Loading

zonque commented Oct 25, 2016 •

edited

Loading

zonque commented Oct 25, 2016 •

edited

Loading

zonque commented Oct 25, 2016 •

edited

Loading