Encoding issue with XMLHttpRequest and Firefox 3

In Firefox 3.0.0 there is a “strange” regression issue regarding the encoding of XMLHttpRequest requests. It's not a bug per se, it's just different behavior, which we ran into (and no other browser does it this way)

What we basically do on the client side in JavaScript:

this.data = new XMLHttpRequest();
this.data.open('POST', dataURI);
this.data.send(xml);

where “xml” is a DOMDocument Object.

In Firefox 2.0 this request came with a

Content-Type: application/xml

and the xml in the POST body was encoded in UTF-8 (no encoding information in the XML declaration)

IE7 does:

text/xml; charset=UTF-8,

But Firefox 3.0.0 sends this as

Content-Type: application/xml; charset=ISO-8859-1

and the xml in the body is actually ISO-8859-1 encoded, but there is no encoding information in the XML declaration (eg. no <?xml encoding=”ISO-8859-1″?>) and of course our XML loader fall flat on its nose, when it had non-ASCII characters in it…

While having the encoding information only in the HTTP header and not also in the XML declaration is (as far as I can remember, didn't look up any specs) correct from a technical point of view, it was pretty annoying to find this “bug”. And now I have to check on the backend, how the request is encoded on that request on not just rely on “it's UTF-8 nowadays anyway or at least written in the XML declaration, so the XML parser can take care of it” (which was maybe naive from the beginning :))

Here's the code-snippet for the PHP server side:

function transformFromContentTypeToUTF8($str) {

    if (isset($_SERVER['CONTENT_TYPE']) && preg_match('#charset=([^/s^;]+)#',$_SERVER['CONTENT_TYPE'],$matches)) {
        if ($matches[1] == 'UTF-8') {
            return $str;
        }
        if ($matches[1] == "ISO-8859-1") {
            return utf8_encode($str);
        }
        return iconv($matches[1],"UTF-8",$str);
    }
    //if no charset, then return as it came
    return $str;
}

function fixXMLEncodingFromHTTP($xml) {
    if (!preg_match("#<?xml[^>]+encoding=#",$xml)) {
        return transformFromContentTypeToUTF8($xml);
}
return $xml;
}

$rawpost = fixXMLEncodingFromHTTP(file_get_contents('php://input'));

// create a new DOM document out of the posted string
$xmlData = new DOMDocument();
$xmlData->loadXML($rawpost);

BTW, for non-ISO-8859-1 characters, FF 3 does transform them to numeric entities, welcome web 1.0 :)

And there's already a report of that issue on bugzilla, of course. But no idea, if they change that back soon

Do you have a question, a comment, or just feeling inspired? Mention us or share this article on Mastodon or LinkedIn.

Subscribe to blog updates using the RSS Feed.

Topics

Technology