21 Jun 2012

User Data and other stories

Rule 1. Never trust any data coming from the user. You should always sanitize before use. While mysql_real_escape_string protects against sql injection, it won't save you from Javascript XSS. The path for many [PHP] developers is to therefore

  1. Strip tags with strip_tags
  2. mysql_real_escape_string

Bad news though, strip_tags won't save you, and it is worse when you use the allowable_tag option. (Yeah, we are always tempted to allow some tags for minor formatting like b, i, u, em, and 'a' sail). As clearly stated in the manual, strip_tags does not modify the attributes of allowed tags or validate attribute values. This will pass then:

1
<a href="javascript:alert(document.cookie)">Warido</a>

Escape route

Rule 2. Only accept plain text whenever you can. Kill html, scripts and friends. Ok, just joking. We are in a real world afterall. But really, except you are doing heavy dom manipulation on the fly, htmlentities() is good enough. The function simply converts HTML characters into entities.

1
2
3
4
5
<?php
echo htmlentities('<a href="javascript:alert(document.cookie)">Warido</a>', ENT_QUOTES, 'UTF-8');
// Outputs:
// <a href=&quot;js:alert(document.cookie)&quot; &quot;js:alert(document.cookie)&quot;>Warido</a>
?>

Some HTML please

When you run htmlentities on a string, the "HTMLness" is killed and it gets displayed as normal text (the html will not be interpreted). So really, your browser will simply display Warido in plain text as against a clickable link that displays a popup when clicked or moused over. To allow some HTML to pass,

  1. Run htmlentities on the string
  2. Use regular expression to match for the exact tag and allowed attribute and values and re-write appropraitely

1
2
3
4
5
6
<?php
$input = htmlentities($input, ENT_QUOTES, 'UTF-8');
// b
$input = preg_replace('!<b>(.*?)</b>!im', '<b>$1</b>', $input);
// a
$input = preg_replace("!<a +href=&quot;((?:ht|f)tps?://.*?)&quot;(?: +title=&quot;(.*?)&quot;)?(?: +rel=&quot;(.*?)&quot;)? *>(.*?)</a>!im", '<a href="$1">$4</a>', $input);

Client side considerations - javascript

So far we've been considering the server side. What about the client side of things? Dom manipulation in javascript?

One of the techniques we use in Prowork to give users a quick feel is to immediately perform their wanted action while the actual processing runs in the background. Say adding notes to a task for instance. Once the user clicks 'Add note', the note is immediately added to the task (appended to the DOM) while the real submission to the server goes on in the background. While the submitted data ofcourse will be filtered in the server side, what about the one injected into the dom already?

Ofcourse, htmlentities for the clientside

There is a nice htmlentities function in javascript: http://phpjs.org/functions/htmlentities to take care of this. So even if you can't filter the data from the serverside, you can do it from the client side. However, don't rely on client side filtering for data going to the server. Filter from the server side rather.

Further reading

My name is Opeyemi Obembe. I build things for web and mobile. You should follow me on Twitter (@kehers).