MIME type parsing: backslash

Issue #14212408 • Assigned to Divya G.

Details

Author
Anne v.
Created
Oct 13, 2017
Privacy
This issue is public.
Reports
Reported by 1 person

Sign in to watch or report this issue.

Steps to reproduce

A document labeled as

text/html;charset="\g\b\k"

ends up with the “default” encoding, rather than GBK. This is wrong per HTTP and per the specification I’m working on that’s a bit more tolerant than HTTP to accept cases such as

text/html;

More context: https://github.com/whatwg/mimesniff/issues/38

Attachments

0 attachments

    Comments and activity

    • Microsoft Edge Team

      Changed Assigned To to “Steven K.”

    • Hi Anne,

      I see how the U+005C REVERSE SOLIDUS character ("

      ") should be serialized:

      https://mimesniff.spec.whatwg.org/branch-snapshots/annevk/mime-type/#serializing-a-mime-type

      For each character 
      char in 
      parameters[
      name], execute the following steps:
      If char is equal to the U+0022 QUOTATION MARK character (“”
      ") or to the U+005C REVERSE SOLIDUS character ("
      "), append the U+005C REVERSE SOLIDUS character ("
      ") to serialization.
      Append char to serialization.

      However, I do not follow the parsing of the U+005C REVERSE SOLIDUS character, I.e. how it is removed in the parsing section:

      https://mimesniff.spec.whatwg.org/branch-snapshots/annevk/mime-type/#parsing-a-mime-type

      I realize this information is on a branch of the standard (I liked the alert by the way.)

      I think I am missing something obvious and would appreciate some guidance.  E.g. this is a basic HTTP parsing rule for escaped sequences/characters?

      Thank you,

      Steve

    • The generated snapshot is out-of-date due to some error in the build process I’ve yet to look into. Been occupied with some other things.

      However,
      https://tools.ietf.org/html/rfc7231#section-3.1.1.1
      describes the quoted-string behavior, including backslash escapes as well. Browsers don’t exactly implement HTTP (when it comes to erroneous input) which is why I’m writing this new parser, but the problem here is that Edge doesn’t handle error-free input correctly.

    • Appreciate the help locating that information.

      I see the usage being tested here is the 'quoted-pair’.  Learn something new everyday.  :-)

      quoted-pair = “” ( HTAB / SP / VCHAR / obs-text )

      https://tools.ietf.org/html/rfc7230#section-3.2.6

      Thanks again,

      Steve

    • Microsoft Edge Team

      Changed Assigned To to “Venkat K.”

      Changed Assigned To from “Venkat K.” to “Divya G.”

    You need to sign in to your Microsoft account to add a comment.

    Sign in