unescaped, escaped, double-escaped

Tim Bray explores the mess related to escaping HTML/XML information:

The policy ideally should be, I think, that all data in the Your Code block has to be known to be escaped or known to be unescaped. That is to say, you always do escaping on the data at the pointy end of the input arrows, or you never do it.

I think always-unescaped is a little better, since some of those output arrows might not be XML or HTML, but probably they all are; so always-escaped is certainly viable.

and then it gets worse, as treatment of HTML in RSS aggregators varies.

The same problem presents itself in cross-site scripting and code injection attacks.
It’s the bane of macro language beginners too, whether it’s shell or troff.

Leave a Reply