Referer header incorrectly encoded for XmlHttpRequest

Issue #15561198 • Assigned to Steven K.

Details

Author
Karl-Johan S.
Created
Jan 22, 2018
Privacy
This issue is public.
Found in
  • Microsoft Edge
Found in build #
41.16299
Reports
Reported by 1 person

Sign in to watch or report this issue.

Steps to reproduce

We are seeing a problem where the Referer header is encoded wrong for XHR requests when the current url contains non-ascii chars.

The following url illustrates the problem:

https://www.ostersundsbibliotek.se/jämtland-härjedalen?refId=FeAaCH&culture=sv

We load several resources on page load and when inspecting the Referer header we can see that “ä” is encoded differently for XHR and regular requests.

XHR
https://www.ostersundsbibliotek.se/api/search/media-types?no-cache-lang=sv

Referer: https://www.ostersundsbibliotek.se/j%C3%83%C2%A4mtland-h%C3%83%C2%A4rjedalen?refId=FeAaCH&culture=sv

Image
https://www.ostersundsbibliotek.se/svg/snow_hart.svg

Referer: https://www.ostersundsbibliotek.se/j%C3%A4mtland-h%C3%A4rjedalen?refId=FeAaCH&culture=sv

The complete requests from the developer tools are attached as images.

If we however browse to the page with encoded characters in the url it works as expected and both cases are encoded properly.

https://www.ostersundsbibliotek.se/j%C3%A4mtland-h%C3%A4rjedalen?refId=FeAaCH&culture=sv

The main problem that this causes is that the webserver rejects the whole request with a 400 Bad Request error (running ASP.Net Core with Kestrel behind IIS) since the headers are invalid.

Attachments

Comments and activity

  • Microsoft Edge Team

    Changed Assigned To to “Steven K.”

  • Hi,

    Will you provide a simplified repro for this issue?  I have looked at this issue and it appears the UTF-8 characters for the Unicode equivalent symbols, ‘ä’ in this case,
    are not getting converted properly.  For example:

    0xC3A4 - UTF-8(hex) for U+00E4 ä

    is being converted into two two byte UTF-8 characters: "0xC383 0xC2A4", which of course is not correct.

    I ask for a simplified repro because I see some HTML syntax issues on the site you gave me, which could cause the parser to misinterpret the source URLs.  I have attached a screenshot of the console and below is one example where there is a missing quotation mark ‘"’

       <div class="content-image rs_skip">
          <img src="https://cdn1.ostersundsbibliotek.se/images/5a2fc789193f640af8068179?crop=370x155" role=" presentation"/="">
       </div>

    The mark after the crop=370x155 is missing.

    It would be good to verify that the server is encoding the source files as UTF-8 properly and not a cp1252 encoded file but treated it as latin1 or some other similar issue.

    Hope you can provide another repro as this is an interesting issue.  Appreciate the submission and the help,

    Steve

  • Hi Steven,

    A repro can be found at https://rawgit.com/karl-sjogren/edge-15561198-repro/master/file-with-ä-in-filename.html.

    The source for the repro is available at https://github.com/karl-sjogren/edge-15561198-repro/

You need to sign in to your Microsoft account to add a comment.

Sign in