Legacy issue tracker

You are currently viewing the legacy bug tracker for MediaGoblin. We have now switched to code hosting and issue tracking at SourceHut.

This legacy issue tracker remains available to allow us to reference old issues. If you find a ticket here which is still relevant, please feel free to continue the discussion. For new issues, please use SourceHut.

Context Navigation

← Previous Ticket
Next Ticket →

Opened 14 years ago

Last modified 14 years ago

#92 closed defect (FIXED)

Need a util.html_cleaner() method with good tests

Reported by:	Christopher Allan Webber	Owned by:
Priority:	minor	Milestone:	0.0.3
Component:	programming	Keywords:
Cc:		Parent Tickets:

Description

There's no way around it, we're going to need a good sanitizer for
comment/description/etc html **regardless** of
`whether or not we use markdown <http://bugs.foocorp.net/issues/363#note-5>`_
... so I think the html cleaner should:


-  use
   `lxml.html.clean <http://lxml.de/lxmlhtml.html#cleaning-up-html>`_
-  Use **only whitelisted tags**... this is possible through lxml,
   we need to do it right. Tags I think we'll need to allow: b, i, em,
   strong, p, ul, ol, li, a, br. (any others?)
-  Only whitelisted attributes
-  XSS attribute attack prevention, other XSS prevention stuff...
   see the lxml.html.clean docs.
-  have tests that try to attack each one of these components.

In the future it might be a good idea to also prevent certain other
annoying things... deeply nested

.. raw:: html

   <p>
   
's, etc. But for now I think this will be good enough.

Change History (5)

comment:1 by Christopher Allan Webber, 14 years ago

Milestone:	→ 0.0.3

comment:1 by Christopher Allan Webber, 14 years ago

I've added a cleaner method to util, util.clean\_html(). I even
added tests for the two biggest concerns, blocking images and any
sort of javascript. Could use a decent Elrond review... and better
tests ;)

comment:2 by Christopher Allan Webber, 14 years ago

Status:	New → Closed

We have a couple of pretty good tests in here; I'm satisfied
enough. It looks like lxml has itself some good tests too for more
graunular stuff.

Maybe in the future we could explicitly test all of:


-  `http://htmlpurifier.org/live/smoketests/xssAttacks.php <http://htmlpurifier.org/live/smoketests/xssAttacks.php>`_
-  `http://ha.ckers.org/xss.html <http://ha.ckers.org/xss.html>`_

but I don't think we have time, and it really looks like we've
covered all of those cases (even with the few tests we have now)
but just not in a granular way.

Marking this as closed.

comment:3 by Elrond, 14 years ago

Component:	→ Programming

comment:3 by Will Kahn-Greene, 13 years ago

The original url for this bug was http://bugs.foocorp.net/issues/379 .

Note: See TracTickets for help on using tickets.

Download in other formats: