Opened 10 years ago
Closed 9 years ago
#1047 closed defect (invalid)
Server error when URL contains weird characters
Reported by: | ayleph | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | 0.9.0 |
Component: | programming | Keywords: | url, ascii, encode, error |
Cc: | Parent Tickets: |
Description
There's an entry in my logfiles for a bot trying to access a specific URL which causes a server error:
'ascii' codec can't encode character u'\ufeff' in position 14: ordinal not in range(128)
Error - <type 'exceptions.UnicodeEncodeError'>: 'ascii' codec can't encode character u'\ufeff' in position 14: ordinal not in range(128) URL: https://goblinrefuge.com/mediagoblin/u/stwaldman/%7C%EF%BB%BFSEO File '/path/to/mediagoblin/lib/python2.7/site-packages/Paste-1.7.5.1-py2.7.egg/paste/exceptions/errormiddleware.py', line 144 in __call__ app_iter = self.application(environ, sr_checker) File '/path/to/mediagoblin/lib/python2.7/site-packages/Paste-1.7.5.1-py2.7.egg/paste/urlmap.py', line 203 in __call__ return app(environ, start_response) File '/path/to/mediagoblin/mediagoblin/app.py', line 268 in __call__ return self.call_backend(environ, start_response) File '/path/to/mediagoblin/lib/python2.7/site-packages/Werkzeug-0.9.6-py2.7.egg/werkzeug/wsgi.py', line 567 in __call__ cleaned_path = cleaned_path.encode(sys.getfilesystemencoding()) UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in position 14: ordinal not in range(128) CGI Variables ------------- DOCUMENT_ROOT: '/path/to/mediagoblin/mediagoblin' GATEWAY_INTERFACE: 'CGI/1.1' HTTPS: 'on' HTTP_ACCEPT: 'text/html,text/plain,text/xml,text/*,application/xml,application/xhtml+xml,application/rss+xml,application/atom+xml,application/rdf+xml' HTTP_ACCEPT_LANGUAGE: 'en' HTTP_CONNECTION: 'keep-alive' HTTP_HOST: 'goblinrefuge.com' HTTP_USER_AGENT: 'Mozilla/5.0 (compatible; MJ12bot/v1.4.5; http://www.majestic12.co.uk/bot.php?+)' PATH_INFO: '/u/stwaldman/|\xef\xbb\xbfSEO' PATH_TRANSLATED: '/path/to/mediagoblin/mediagoblin/u/stwaldman/|\xef\xbb\xbfSEO' REDIRECT_STATUS: '200' REMOTE_ADDR: '127.0.0.1' REMOTE_PORT: '58734' REQUEST_METHOD: 'GET' REQUEST_URI: '/mediagoblin/u/stwaldman/%7C%EF%BB%BFSEO' SCRIPT_FILENAME: '/path/to/mediagoblin/mediagoblin/mediagoblin' SCRIPT_NAME: '/mediagoblin' SERVER_PROTOCOL: 'HTTP/1.1' SERVER_SOFTWARE: 'lighttpd/1.4.35'
Change History (5)
comment:1 by , 10 years ago
comment:2 by , 10 years ago
Sorry for the delay; I don't remember getting an email with your comment. Here's a recent bug report on IRC.
Besnik_b_ | I think I found a bug: if you put a "ë" character in the identifier of a collection, Mediagoblin will save the change, but you cannot access the URL any more. The faulty character is accepted without any complain or notification. Besnik_b_ | Here you are the link to see that in action: https://goblinrefuge.com/mediagoblin/u/besnik/collection/firefox-os-n%C3%AB-shqip-versioni-tablet/ Besnik_b_ | The identifier in question, "firefox-os-ne-shqip-versioni-tablet" was changed to "firefox-os-në-shqip-versioni-tablet", which render as "firefox-os-n%AB-shqip-versioni-tablet" in the browser
comment:3 by , 10 years ago
Owner: | set to |
---|---|
Status: | new → in_progress |
comment:4 by , 9 years ago
Milestone: | → 0.9.0 |
---|
comment:5 by , 9 years ago
Owner: | removed |
---|---|
Resolution: | → invalid |
Status: | in_progress → closed |
This isn't a problem with MediaGoblin code. It's an issue of an incorrect locale being used by Werkzeug. Recent Werkzeug (> 0.11) includes a work-around for this and assumes UTF-8 locale if the system locale reports ASCII. Closing ticket as 'invalid' since this has nothing to do with MediaGoblin code.
ayleph, thanks for submitting. Can you submit steps to reproduce?