﻿id	summary	reporter	owner	description	type	status	priority	milestone	component	resolution	keywords	cc	parents
5348	PDF processing fails on non-ASCII characters in extract_pdf_info	ayleph		"A user tried to upload a PDF file but it failed to process. I found the following in my celery logfile.

{{{
[2015-08-24 21:01:58,235: ERROR/MainProcess] Task mediagoblin.processing.task.ProcessMedia[05fe5efc-fe9f-47b8-8c49-6133ee66e1ee] raised unexpected: UnicodeDecodeError('ascii', 'Creator:        Microsoft\xc2\xae Word 2010\n', 25, 26, 'ordinal not in range(128)')
Traceback (most recent call last):
  File ""/path/to/mediagoblin/lib/python2.7/site-packages/celery-3.1.17-py2.7.egg/celery/app/trace.py"", line 240, in trace_task
    R = retval = fun(*args, **kwargs)
  File ""/path/to/mediagoblin/lib/python2.7/site-packages/celery-3.1.17-py2.7.egg/celery/app/trace.py"", line 438, in __protected_call__
    return self.run(*args, **kwargs)
  File ""/path/to/mediagoblin/mediagoblin/processing/task.py"", line 101, in run
    processor.process(**reprocess_info)
  File ""/path/to/mediagoblin/mediagoblin/media_types/pdf/processing.py"", line 412, in process
    self.extract_pdf_info()
  File ""/path/to/mediagoblin/mediagoblin/media_types/pdf/processing.py"", line 338, in extract_pdf_info
    pdf_info_dict = pdf_info(self.pdf_filename)
  File ""/path/to/mediagoblin/mediagoblin/media_types/pdf/processing.py"", line 210, in pdf_info
    lines = [l.decode() for l in lines]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 25: ordinal not in range(128)
}}}"	defect	closed	major		programming	duplicate	pdf,ascii,unicode,decode		
