Opened 13 years ago

Last modified 13 years ago

#92 closed defect (FIXED)

Need a util.html_cleaner() method with good tests

Reported by: Christopher Allan Webber Owned by:
Priority: minor Milestone: 0.0.3
Component: programming Keywords:
Cc: Parent Tickets:


There's no way around it, we're going to need a good sanitizer for
comment/description/etc html **regardless** of
`whether or not we use markdown <>`_
... so I think the html cleaner should:

-  use
   `lxml.html.clean <>`_
-  Use **only whitelisted tags**... this is possible through lxml,
   we need to do it right. Tags I think we'll need to allow: b, i, em,
   strong, p, ul, ol, li, a, br. (any others?)
-  Only whitelisted attributes
-  XSS attribute attack prevention, other XSS prevention stuff...
   see the lxml.html.clean docs.
-  have tests that try to attack each one of these components.

In the future it might be a good idea to also prevent certain other
annoying things... deeply nested

.. raw:: html

's, etc. But for now I think this will be good enough.

Change History (5)

comment:1 by Christopher Allan Webber, 13 years ago

Milestone: 0.0.3

comment:1 by Christopher Allan Webber, 13 years ago

I've added a cleaner method to util, util.clean\_html(). I even
added tests for the two biggest concerns, blocking images and any
sort of javascript. Could use a decent Elrond review... and better
tests ;)

comment:2 by Christopher Allan Webber, 13 years ago

Status: NewClosed
We have a couple of pretty good tests in here; I'm satisfied
enough. It looks like lxml has itself some good tests too for more
graunular stuff.

Maybe in the future we could explicitly test all of:

-  ` <>`_
-  ` <>`_

but I don't think we have time, and it really looks like we've
covered all of those cases (even with the few tests we have now)
but just not in a granular way.

Marking this as closed.

comment:3 by Elrond, 13 years ago

Component: Programming

comment:3 by Will Kahn-Greene, 12 years ago

The original url for this bug was .

Note: See TracTickets for help on using tickets.