Opened 9 years ago
Closed 9 years ago
#5348 closed defect (duplicate)
PDF processing fails on non-ASCII characters in extract_pdf_info
Reported by: | ayleph | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | |
Component: | programming | Keywords: | pdf, ascii, unicode, decode |
Cc: | Parent Tickets: |
Description
A user tried to upload a PDF file but it failed to process. I found the following in my celery logfile.
[2015-08-24 21:01:58,235: ERROR/MainProcess] Task mediagoblin.processing.task.ProcessMedia[05fe5efc-fe9f-47b8-8c49-6133ee66e1ee] raised unexpected: UnicodeDecodeError('ascii', 'Creator: Microsoft\xc2\xae Word 2010\n', 25, 26, 'ordinal not in range(128)') Traceback (most recent call last): File "/path/to/mediagoblin/lib/python2.7/site-packages/celery-3.1.17-py2.7.egg/celery/app/trace.py", line 240, in trace_task R = retval = fun(*args, **kwargs) File "/path/to/mediagoblin/lib/python2.7/site-packages/celery-3.1.17-py2.7.egg/celery/app/trace.py", line 438, in __protected_call__ return self.run(*args, **kwargs) File "/path/to/mediagoblin/mediagoblin/processing/task.py", line 101, in run processor.process(**reprocess_info) File "/path/to/mediagoblin/mediagoblin/media_types/pdf/processing.py", line 412, in process self.extract_pdf_info() File "/path/to/mediagoblin/mediagoblin/media_types/pdf/processing.py", line 338, in extract_pdf_info pdf_info_dict = pdf_info(self.pdf_filename) File "/path/to/mediagoblin/mediagoblin/media_types/pdf/processing.py", line 210, in pdf_info lines = [l.decode() for l in lines] UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 25: ordinal not in range(128)
Change History (2)
comment:1 by , 9 years ago
comment:2 by , 9 years ago
Resolution: | → duplicate |
---|---|
Status: | new → closed |
Looks like this is a duplicate of #983. Closing this one.
Note:
See TracTickets
for help on using tickets.
This may be related to #5335. When I get this error, my mediagoblin log file simply displays the message below, same as 5335.