In designing a little on-line chat program accessible through a web browser, I came across what was one of the most frustrating problems in my software development career.

Here's what I wanted to do:

1) User types in some text on a form and submits the form
2) JavaScript mangles the text so that it is not easily readable and sends the disguised text to a server-side script
3) Server-side script writes any text received to the end of a "conversation" file as XML
4) Server-side script mangles the XML document and sends back to all attached clients
5) Client receives the mangled text, un-mangles and generates an XML document, extracts the text, and displays on the screen

The overall process is pretty simple and is represented by the following diagram:

flow of text throughout the chat application

The frustrating part came when I started encountering the quirks of the various languages/syntaxes that I was using coupled with the requirement to be able to pass any text to the server. Things got confusing fast, and as they always say, the devil is in the details.

Sending Text from JavaScript to PHP

The transition from user text to mangled text is a simple function (written in both JavaScript and PHP) that takes an input string of N characters and returns an output string of N characters. The function works transitively (such that mangled text becomes un-mangled and vice versa) for convenience. This is the really simple part of the process:

var outStr = mangle("Our sample text");
// outStr is now "ubW(&2v A;(z;+z"

However, it is when we want to actually send this output string as a URL argument to a server-side PHP script that we encounter our first problem. For example:

http://www.mydomain.com/chat.php?USER=Jeff&TEXT=....

Just as with text entered from the form, the mangled string could contain any printable ASCII character. But the ampersand character ("&") is used to delimit URL variables in PHP and valid URLs cannot have spaces (" ") in them. There are other characters that have special meaning in URLs as well. Thus, if we tried to put our outStr into the URL it would become:

http://www.mydomain.com/chat.php?USER=Jeff&TEXT=ubW(&2v A;(z;+z

The portion after the space (i.e. "A;(z;+z") is removed from the URL. Also, the $TEXT variable from the PHP side would contain only "ubW(" and the remaining string is interpreted as another PHP variable called "2v".

What we want is for the PHP script to see one variable called $TEXT with the complete mangled string. Thus, it is important that we escape the mangled string such that spaces become "%20", ampersands become "%26", "=" become "%3D", and other characters like "%" become "%25". i.e. any character that might be interpreted in a special way inside a URL Thankfully we can just use the JavaScript escape() function for this. Handy-dandy:

var outStr = mangle("Our sample text");
// outStr is now "ubW(&2v A;(z;+z"
outStr = escape(outStr);
// outStr is now "ubW(%26v%20A;(z;+z"
var urlStr = "http://www.mydomain.com/chat.php?USER=Jeff&TEXT=" + escape(outStr);
// urlStr is now http://www.mydomain.com/chat.php?USER=Jeff&TEXT=ubW(%26v%20A;(z;+z"

Now we can send this string to the PHP script, right? Almost, but not quite! It seems that the "+" character in a URL is automatically translated into a space by PHP. Thus, if we really want to send the + character in our URL we'll also need to escape it as "%2B" (and only AFTER we escape the rest of our string, otherwise its "%2B" would become "%252B"!!):

var outStr = mangle("Our sample text");
// outStr is now "ubW(&2v A;(z;+z"
outStr = escape(outStr);
// outStr is now "ubW(%26v%20A;(z;+z"
outStr = outStr.replace(/(+)/g, "%2B");
// outStr is now "ubW(%26v%20A;(z;%2Bz"
var urlStr = "http://www.mydomain.com/chat.php?USER=Jeff&TEXT=" + escape(outStr);
// urlStr is now http://www.mydomain.com/chat.php?USER=Jeff&TEXT=ubW(%26v%20A;(z;%2Bz"

Is the string safe for PHP now? Yes, it would seem so.

[NOTE: When discussing this topic with a friend he told me about the encodeURIComponent() function. However, I had already gone through the above learning experience and I also noted that encodeURIComponent() is not supported in MacIE or Safari...poor Mac users.]

The nice part is that once passed to PHP, the PHP interpreter automatically un-escapes any %xx characters such they magically return to their equivalent single-character representations again.

But! Apart from unescaping the %xx characters, PHP can also do some mangling of its own. This depends on whether or not the "magic quotes" feature is enabled or not. If it is, then PHP automatically adds backslashes to specific characters (single-quote and double-quotes) presupposing that you're going to try and shove the text into a database. This is NOT what we want, as extra characters changes the content that was originally typed when you un-mangle things. i.e. we'll have extra characters in our returned string. You can avoid this on the PHP side by either configuring PHP to not use magic quotes (via php cfg files or .htaccess) or use some simple PHP code:

if(get_magic_quotes_gpc()) {
  $thechatmsg = stripslashes($TEXT);
}

Now $TEXT should contain "ubW(&2v A;(z;+z" again. We're finally back to our mangled version of the text from the JavaScript.

However, the fun doesn't stop there. In fact, we've only made it from (1) to (2) in our diagram.

Creating XML in PHP

Now we need to un-mangle the text and shove it into an XML document:

if(get_magic_quotes_gpc()) {
  $thechatmsg = stripslashes($TEXT);
}
$thechatmsg = mangle($thechatmsg);

There, now we have un-mangled text, but before we can blindly shove it into a XML document, we have to make sure that the original text typed by the user doesn't violate the allowed characters of an XML document (namely < and > characters). As you may know, XML documents rely heavily on markup tags and thus, characters such as < and > are verbotten within the XML document contents itself. Since the receiving clients expect a valid XML document we have to take care that any of the data we're shoving into the XML document does not invalidate the syntax required. PHP provides a simple function for this called htmlspecialchars():

if(get_magic_quotes_gpc()) {
  $thechatmsg = stripslashes($TEXT);
}
$thechatmsg = mangle($thechatmsg);
$thechatmsg = htmlspecialchars($thechatmsg, ENT_QUOTES);

We are finally ready to format $thechatmsg into a XML document and write to a file:

if(get_magic_quotes_gpc()) {
  $thechatmsg = stripslashes($TEXT);
}
$thechatmsg = mangle($thechatmsg);
$thechatmsg = htmlspecialchars($thechatmsg, ENT_QUOTES);
$xmlStr = sprintf("<chat xml:space="preserve"><msg id="%d"><user>%s</user><text>%s</text></msg></chat>", $n, $USER, $thechatmsg);
fwrite($f, $xmlStr);

We use xml:space="preserve" to indicate that whitespace within the XML document should be preserved.

Now we've made it safely from (2) to (3). We've sent the mangled text across the network to a PHP script and the PHP script has un-mangled it, written it into an XML document to the filesystem. Next, we have to send all new messages back to the JavaScript call in the form of an XML document (as a response to a XMLHttpRequest).

Sending Text from PHP to JavaScript

This is actually pretty straightforward PHP code:

$xmldoc = "";
while(!feof($f)) {
  $line = trim(fgets($f));
  if(strlen($line) > 0) {
    $xmldoc .= $line;
  }
} // while (loop
// now mangle the entire XML document
$xmldoc = mangle($xmldoc);
printf("%sn", $xmldoc);

Now we've made it from (3) to (4). The only thing left for the client to do is to un-mangle it, reformat it into an XML document, walk the document nodes and extract the information.

Creating an XML Document From Text in JavaScript

Since browsers have different means of doing this, I decided to use Sarissa which abstracts these differences and lets me just deal with the XML document itself.

The mangled text comes back as the responseText field of the XMLHttpRequest object (not as valid XML). I take this string, un-mangle it, and create an XML document out of it:

var xmlString = mangle(req.responseText);
var oDomDoc = Sarissa.getDomDocument();
oDomDoc.loadXML(xmlString);
//get the root node
var xmldoc = oDomDoc.documentElement;

The one caveat I had is that regular HTML does not (by default) preserve whitespace. To get around this, I used the CSS rule "white-space: pre".

§83 · April 19, 2005 · Ajax, JavaScript, PHP, Software, Technology, Web, XML · · [Print]

Leave a Comment to “Passing Text Between Web Components”

  1. Mauriat says:

    Maybe I’m missing the point but why do you need to pass data by URL argument? Wouldn’t it just be better to use a form POST?

  2. Jeff Schiller says:

    I’m not sure how that could be done exactly using XMLHttpRequest. I mean I see how I could set the method to “POST”, but not how I could set arbitrary POST data that is passed to the server…

  3. Daniel says:

    Hello;

    You can set the post mode when you open the connection. it’s quite easy. please check the xmlhttprequest docs for mor details.

    Regards
    Daniel Zelisko

  4. You’re right. I started looking into this last night and the send method of XMLHttpRequest is where you send the POST data. Thanks!

  5. While working on http://treehouse.ofb.net/chat/?lang=en, I noticed that using POST with XMLHttpRequest seemed to fail in Konqueror. I don’t have access to a Mac, but I wouldn’t be surprised if GET were better supported