Opened 9 years ago

Closed 8 years ago

#1047 closed defect (invalid)

Server error when URL contains weird characters

Reported by: ayleph Owned by:
Priority: major Milestone: 0.9.0
Component: programming Keywords: url, ascii, encode, error
Cc: Parent Tickets:

Description

There's an entry in my logfiles for a bot trying to access a specific URL which causes a server error:
'ascii' codec can't encode character u'\ufeff' in position 14: ordinal not in range(128)

Error - <type 'exceptions.UnicodeEncodeError'>: 'ascii' codec can't encode character u'\ufeff' in position 14: ordinal not in range(128)
URL: https://goblinrefuge.com/mediagoblin/u/stwaldman/%7C%EF%BB%BFSEO
File '/path/to/mediagoblin/lib/python2.7/site-packages/Paste-1.7.5.1-py2.7.egg/paste/exceptions/errormiddleware.py', line 144 in __call__
  app_iter = self.application(environ, sr_checker)
File '/path/to/mediagoblin/lib/python2.7/site-packages/Paste-1.7.5.1-py2.7.egg/paste/urlmap.py', line 203 in __call__
  return app(environ, start_response)
File '/path/to/mediagoblin/mediagoblin/app.py', line 268 in __call__
  return self.call_backend(environ, start_response)
File '/path/to/mediagoblin/lib/python2.7/site-packages/Werkzeug-0.9.6-py2.7.egg/werkzeug/wsgi.py', line 567 in __call__
  cleaned_path = cleaned_path.encode(sys.getfilesystemencoding())
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in position 14: ordinal not in range(128)


CGI Variables
-------------
  DOCUMENT_ROOT: '/path/to/mediagoblin/mediagoblin'
  GATEWAY_INTERFACE: 'CGI/1.1'
  HTTPS: 'on'
  HTTP_ACCEPT: 'text/html,text/plain,text/xml,text/*,application/xml,application/xhtml+xml,application/rss+xml,application/atom+xml,application/rdf+xml'
  HTTP_ACCEPT_LANGUAGE: 'en'
  HTTP_CONNECTION: 'keep-alive'
  HTTP_HOST: 'goblinrefuge.com'
  HTTP_USER_AGENT: 'Mozilla/5.0 (compatible; MJ12bot/v1.4.5; http://www.majestic12.co.uk/bot.php?+)'
  PATH_INFO: '/u/stwaldman/|\xef\xbb\xbfSEO'
  PATH_TRANSLATED: '/path/to/mediagoblin/mediagoblin/u/stwaldman/|\xef\xbb\xbfSEO'
  REDIRECT_STATUS: '200'
  REMOTE_ADDR: '127.0.0.1'
  REMOTE_PORT: '58734'
  REQUEST_METHOD: 'GET'
  REQUEST_URI: '/mediagoblin/u/stwaldman/%7C%EF%BB%BFSEO'
  SCRIPT_FILENAME: '/path/to/mediagoblin/mediagoblin/mediagoblin'
  SCRIPT_NAME: '/mediagoblin'
  SERVER_PROTOCOL: 'HTTP/1.1'
  SERVER_SOFTWARE: 'lighttpd/1.4.35'

Change History (5)

comment:1 by Christopher Allan Webber, 9 years ago

ayleph, thanks for submitting. Can you submit steps to reproduce?

comment:2 by ayleph, 9 years ago

Sorry for the delay; I don't remember getting an email with your comment. Here's a recent bug report on IRC.

Besnik_b_ | I think I found a bug: if you put a "ë" character in the identifier of a collection, Mediagoblin will save the change, but you cannot access the URL any more. The faulty character is accepted without any complain or notification.
Besnik_b_ | Here you are the link to see that in action: https://goblinrefuge.com/mediagoblin/u/besnik/collection/firefox-os-n%C3%AB-shqip-versioni-tablet/
Besnik_b_ | The identifier in question, "firefox-os-ne-shqip-versioni-tablet" was changed to "firefox-os-në-shqip-versioni-tablet", which render as "firefox-os-n%AB-shqip-versioni-tablet" in the browser    

comment:3 by Boris Bobrov, 9 years ago

Owner: set to Boris Bobrov
Status: newin_progress

comment:4 by Boris Bobrov, 9 years ago

Milestone: 0.9.0

comment:5 by ayleph, 8 years ago

Owner: Boris Bobrov removed
Resolution: invalid
Status: in_progressclosed

This isn't a problem with MediaGoblin code. It's an issue of an incorrect locale being used by Werkzeug. Recent Werkzeug (> 0.11) includes a work-around for this and assumes UTF-8 locale if the system locale reports ASCII. Closing ticket as 'invalid' since this has nothing to do with MediaGoblin code.

Note: See TracTickets for help on using tickets.