Opened 9 years ago

Closed 9 years ago

#5348 closed defect (duplicate)

PDF processing fails on non-ASCII characters in extract_pdf_info

Reported by: ayleph Owned by:
Priority: major Milestone:
Component: programming Keywords: pdf, ascii, unicode, decode
Cc: Parent Tickets:


A user tried to upload a PDF file but it failed to process. I found the following in my celery logfile.

[2015-08-24 21:01:58,235: ERROR/MainProcess] Task mediagoblin.processing.task.ProcessMedia[05fe5efc-fe9f-47b8-8c49-6133ee66e1ee] raised unexpected: UnicodeDecodeError('ascii', 'Creator:        Microsoft\xc2\xae Word 2010\n', 25, 26, 'ordinal not in range(128)')
Traceback (most recent call last):
  File "/path/to/mediagoblin/lib/python2.7/site-packages/celery-3.1.17-py2.7.egg/celery/app/", line 240, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/path/to/mediagoblin/lib/python2.7/site-packages/celery-3.1.17-py2.7.egg/celery/app/", line 438, in __protected_call__
    return*args, **kwargs)
  File "/path/to/mediagoblin/mediagoblin/processing/", line 101, in run
  File "/path/to/mediagoblin/mediagoblin/media_types/pdf/", line 412, in process
  File "/path/to/mediagoblin/mediagoblin/media_types/pdf/", line 338, in extract_pdf_info
    pdf_info_dict = pdf_info(self.pdf_filename)
  File "/path/to/mediagoblin/mediagoblin/media_types/pdf/", line 210, in pdf_info
    lines = [l.decode() for l in lines]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 25: ordinal not in range(128)

Change History (2)

comment:1 by ayleph, 9 years ago

This may be related to #5335. When I get this error, my mediagoblin log file simply displays the message below, same as 5335.

2015-08-24 21:01:57,717 INFO    [mediagoblin.media_types] No plugins handled extension .pdf
2015-08-24 21:01:57,718 INFO    [mediagoblin.media_types] No plugins using two-step checking found

comment:2 by ayleph, 9 years ago

Resolution: duplicate
Status: newclosed

Looks like this is a duplicate of #983. Closing this one.

Note: See TracTickets for help on using tickets.