User Data and other stories
Rule 1. Never trust any data coming from the user. You should always sanitize before use. While mysql_real_escape_string protects against sql injection, it won't save you from Javascript XSS. The path for many [PHP] developers is to therefore
- Strip tags with strip_tags
- mysql_real_escape_string
Bad news though, strip_tags won't save you, and it is worse when you use the allowable_tag option. (Yeah, we are always tempted to allow some tags for minor formatting like b, i, u, em, and 'a' sail). As clearly stated in the manual, strip_tags does not modify the attributes of allowed tags or validate attribute values. This will pass then:
1
<a href="javascript:alert(document.cookie)">Warido</a>
Escape route
Rule 2. Only accept plain text whenever you can. Kill html, scripts and friends. Ok, just joking. We are in a real world afterall. But really, except you are doing heavy dom manipulation on the fly, htmlentities() is good enough. The function simply converts HTML characters into entities.
1
2
3
4
5
<?php
echo htmlentities('<a href="javascript:alert(document.cookie)">Warido</a>', ENT_QUOTES, 'UTF-8');
// Outputs:
// <a href="js:alert(document.cookie)" "js:alert(document.cookie)">Warido</a>
?>
Some HTML please
When you run htmlentities on a string, the "HTMLness" is killed and it gets displayed as normal text (the html will not be interpreted). So really, your browser will simply display Warido in plain text as against a clickable link that displays a popup when clicked or moused over. To allow some HTML to pass,
- Run htmlentities on the string
- Use regular expression to match for the exact tag and allowed attribute and values and re-write appropraitely
1
2
3
4
5
6
<?php
$input = htmlentities($input, ENT_QUOTES, 'UTF-8');
// b
$input = preg_replace('!<b>(.*?)</b>!im', '<b>$1</b>', $input);
// a
$input = preg_replace("!<a +href="((?:ht|f)tps?://.*?)"(?: +title="(.*?)")?(?: +rel="(.*?)")? *>(.*?)</a>!im", '<a href="$1">$4</a>', $input);
Client side considerations - javascript
So far we've been considering the server side. What about the client side of things? Dom manipulation in javascript?
One of the techniques we use in Prowork to give users a quick feel is to immediately perform their wanted action while the actual processing runs in the background. Say adding notes to a task for instance. Once the user clicks 'Add note', the note is immediately added to the task (appended to the DOM) while the real submission to the server goes on in the background. While the submitted data ofcourse will be filtered in the server side, what about the one injected into the dom already?
Ofcourse, htmlentities for the clientside
There is a nice htmlentities function in javascript: http://phpjs.org/functions/htmlentities to take care of this. So even if you can't filter the data from the serverside, you can do it from the client side. However, don't rely on client side filtering for data going to the server. Filter from the server side rather.
Further reading