Legacy issue tracker

You are currently viewing the legacy bug tracker for MediaGoblin. We have now switched to code hosting and issue tracking at SourceHut.

This legacy issue tracker remains available to allow us to reference old issues. If you find a ticket here which is still relevant, please feel free to continue the discussion. For new issues, please use SourceHut.

Context Navigation

← Previous Ticket
Next Ticket →

Opened 10 years ago

Closed 10 years ago

#5348 closed defect (duplicate)

PDF processing fails on non-ASCII characters in extract_pdf_info

Reported by:	ayleph	Owned by:
Priority:	major	Milestone:
Component:	programming	Keywords:	pdf, ascii, unicode, decode
Cc:		Parent Tickets:

Description

A user tried to upload a PDF file but it failed to process. I found the following in my celery logfile.

[2015-08-24 21:01:58,235: ERROR/MainProcess] Task mediagoblin.processing.task.ProcessMedia[05fe5efc-fe9f-47b8-8c49-6133ee66e1ee] raised unexpected: UnicodeDecodeError('ascii', 'Creator:        Microsoft\xc2\xae Word 2010\n', 25, 26, 'ordinal not in range(128)')
Traceback (most recent call last):
  File "/path/to/mediagoblin/lib/python2.7/site-packages/celery-3.1.17-py2.7.egg/celery/app/trace.py", line 240, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/path/to/mediagoblin/lib/python2.7/site-packages/celery-3.1.17-py2.7.egg/celery/app/trace.py", line 438, in __protected_call__
    return self.run(*args, **kwargs)
  File "/path/to/mediagoblin/mediagoblin/processing/task.py", line 101, in run
    processor.process(**reprocess_info)
  File "/path/to/mediagoblin/mediagoblin/media_types/pdf/processing.py", line 412, in process
    self.extract_pdf_info()
  File "/path/to/mediagoblin/mediagoblin/media_types/pdf/processing.py", line 338, in extract_pdf_info
    pdf_info_dict = pdf_info(self.pdf_filename)
  File "/path/to/mediagoblin/mediagoblin/media_types/pdf/processing.py", line 210, in pdf_info
    lines = [l.decode() for l in lines]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 25: ordinal not in range(128)

Change History (2)

comment:1 by ayleph, 10 years ago

This may be related to #5335. When I get this error, my mediagoblin log file simply displays the message below, same as 5335.

2015-08-24 21:01:57,717 INFO    [mediagoblin.media_types] No plugins handled extension .pdf
2015-08-24 21:01:57,718 INFO    [mediagoblin.media_types] No plugins using two-step checking found

comment:2 by ayleph, 10 years ago

Resolution:	→ duplicate
Status:	new → closed

Looks like this is a duplicate of #983. Closing this one.

Note: See TracTickets for help on using tickets.

Download in other formats: