v3.3: describe behaviour when deserializing a multipart with a Content-Type header#5391
v3.3: describe behaviour when deserializing a multipart with a Content-Type header#5391karenetheridge wants to merge 1 commit into
Conversation
bf6e746 to
7fef1ad
Compare
| When deserializing a multipart message, if a `Content-Type` header is present in the specific part, its value shall be used instead of these values. Behavior is undefined when both this header and `contentType` are defined but with different values. | ||
|
|
There was a problem hiding this comment.
@karenetheridge if I am reading this right, the only potential change here is in the case where contentType is absent and therefore assumed to have a default value, but the per-part Content-Type type contradicts that default value?
I think we can make this fit the compatibility guidelines more cleanly by switching the order of the sentences and changing "shall" to "SHOULD" (or "RECOMMENDED" which means the same thing but fit better in the sentence I'm proposing):
Something like this? But less awkward with the last part?
| When deserializing a multipart message, if a `Content-Type` header is present in the specific part, its value shall be used instead of these values. Behavior is undefined when both this header and `contentType` are defined but with different values. | |
| When deserializing a multipart message, behavior is undefined when the `Content-Type` header and the `contentType` field have different values. If the `contentType` field is absent but `Content-Type` is present, it is RECOMMENDED to use the value of `Content-Type`. However, this behavior is still technically undefined due to the possibility that the `schema` is not structured for the received `Content-Type`. |
There was a problem hiding this comment.
The only potential change here is in the case where contentType is absent and therefore assumed to have a default value, but the per-part Content-Type type contradicts that default value?
No, I am not presuming that the default value would be considered here at all -- I think the Content-Type header in the message part should be used in preference to that: a value being underspecified means that we do the best with whatever information we do have, and in this case we have a header telling us what type the message sender intends it to be.
Your edit sounds fine, except I would remove the last sentence: the schema may be just fine with the Content-Type (consider the case where encoding/contentType says "text/plain; charset=UTF-8" but the Content-Type header says "text/plain; charset=Shift_JIS"... or application/yaml instead of application/json).
The logic I'm using is:
- if there is style/explode/allowReserved present, use those;
- otherwise, check for encoding/contentType AND a Content-Type header:
- if we have just one or the other, use it;
- else if we have both, they must match, else generate an error, but still decode with the Content-Type header (because we can presume the message sender knows what they're doing);
- else (we have neither): do we have a schema for this data location?
- yes, examine the schema to see what type this should be, and select the media-type on that basis; if it's "text/plain", also apply the charset from any adjacent
_charset_message part. - no: do nothing; leave the payload as a string (and now we start unwinding our recursion as there is nothing more we can do with this part of the messge).
- yes, examine the schema to see what type this should be, and select the media-type on that basis; if it's "text/plain", also apply the charset from any adjacent
However, while looking at my test cases a little more closely, I'm wondering if it might be more sensible to simply prefer Content-Type over encoding/contentType without erroring at all when the values conflict. contentType is useful as a recipe for serializing a message, and a hint for how to deserialize when insufficient information is present in the message itself, but when Content-Type is right there in the message next to the data, surely we should just go ahead and use it? That's what an application would do in the absence of an OAD telling them what the message should look like. If the data decodes to the desired result, why should this be an error at all?
[...and after experimenting with tests, I am convinced this is correct.. it would not be useful to generate an error when we're still able to successfully deserialize the message. The schema is still there as a verification step to catch issues with the deserialized content, and there's a good chance that it came out correctly anyway even if the media-type didn't match what the OAD said the serialization process ought to be using. If a media-type was used that we don't know how to decode, then we can error on that instead.]
There was a problem hiding this comment.
(edited to fix some errors; read in web rather than in email)
| ##### Handling Multiple `contentType` Values | ||
|
|
||
| When multiple values are provided for `contentType`, parsing remains straightforward as the part's actual `Content-Type` is included in the document. | ||
| When multiple values are provided for `contentType`, parsing remains straightforward as the part's actual `Content-Type` is included in the document; it SHOULD match one of the media types in `contentType`, if it is present. |
There was a problem hiding this comment.
I think that with the re-ordered sentences above, this is not necessary as it is simply a consequence of the other statement. It's not entirely clear to me that we can put a SHOULD on the API behavior, though. I think we can only put SHOULDs on our side of things. We should maybe look and see if we have other SHOULDs that impact API behavior rather than OAS tool behavior — If we already do, then I'm perfectly happy for us to add another.
There was a problem hiding this comment.
It's not entirely clear to me that we can put a SHOULD on the API behavior, though. I think we can only put SHOULDs on our side of things
I'm not sure what you mean? What is "our side"?
There was a problem hiding this comment.
but anyway, yes, I agree with removing this addition, as it is unneeded; of course we want the Content-Type header to match encoding/contentType, but if they don't there's nothing useful the deserializer can do about it now (as above, generating an error about this is not helpful assuming we were able to deserialize anyway).
(Also, I feel good that the original statement here is consistent with my assumption that the Content-Type header is the value we should prefer over anything else) :)
This is not really a breaking change, since I think what's described here is intuitively the right thing to do anyway, but as it's a new directive I'd still only add it in 3.3 and not backport it to 3.2.1.