MIME type parsing: backslash

Anne v.
Oct 13, 2017
Steps to reproduce

A document labeled as


ends up with the “default” encoding, rather than GBK. This is wrong per HTTP and per the specification I’m working on that’s a bit more tolerant than HTTP to accept cases such as


More context: https://github.com/whatwg/mimesniff/issues/38


    • Hi Anne,

      I see how the U+005C REVERSE SOLIDUS character ("

      ") should be serialized:


      For each character 
      char in 
      name], execute the following steps:
      If char is equal to the U+0022 QUOTATION MARK character (“”
      ") or to the U+005C REVERSE SOLIDUS character ("
      "), append the U+005C REVERSE SOLIDUS character ("
      ") to serialization.
      Append char to serialization.

      However, I do not follow the parsing of the U+005C REVERSE SOLIDUS character, I.e. how it is removed in the parsing section:


      I realize this information is on a branch of the standard (I liked the alert by the way.)

      I think I am missing something obvious and would appreciate some guidance.  E.g. this is a basic HTTP parsing rule for escaped sequences/characters?

      Thank you,


    • The generated snapshot is out-of-date due to some error in the build process I’ve yet to look into. Been occupied with some other things.

      describes the quoted-string behavior, including backslash escapes as well. Browsers don’t exactly implement HTTP (when it comes to erroneous input) which is why I’m writing this new parser, but the problem here is that Edge doesn’t handle error-free input correctly.

    • Appreciate the help locating that information.

      I see the usage being tested here is the 'quoted-pair’.  Learn something new everyday.  :-)

      quoted-pair = “” ( HTAB / SP / VCHAR / obs-text )


      Thanks again,


