Opened 5 years ago

Last modified 5 years ago

#1061 new defect

Media gets stuck in processing

Reported by: ayleph Owned by:
Priority: major Milestone:
Component: programming Keywords: celery, processing, stuck
Cc: Parent Tickets:

Description

I occasionally see media get "stuck" in processing in the media processing panel. The web UI displays the transcoding progress, but the progress is stuck at x% and does not update no matter how long I let it sit.

When this happens, there is nothing useful in my DEBUG log. The media is not available for reprocessing. The only thing I can do is call deletemedia on the media ID and ask the uploader to re-submit the file.

$ bin/gmg reprocess available 1549
** Message: pygobject_register_sinkfunc is deprecated (GstObject)
Available processors:
=====================

The attached image shows an example of this issue. Media ID 1549 has been stuck at 23% for nearly a week at time of upload; media ID 1658 has been stuck at 95% for less than a day. There have been over 100 successfully processed items in between them.

Subtickets

Attachments (1)

stuck-in-processing.png (23.3 KB) - added by ayleph 5 years ago.
media stuck in processing

Download all attachments as: .zip

Change History (7)

Changed 5 years ago by ayleph

Attachment: stuck-in-processing.png added

media stuck in processing

comment:1 Changed 5 years ago by Boris Bobrov

Are these videos? Do the files stop processing every time you upload them or just once in a while? Could you please either attach one of them or share with me privately?

comment:2 in reply to:  1 Changed 5 years ago by ayleph

Replying to breton:

Are these videos? Do the files stop processing every time you upload them or just once in a while? Could you please either attach one of them or share with me privately?

I was just about to add the following, but you beat me to it :)

I should have added that in instances where the uploader re-submits the file, the file is generally transcoded properly. So it doesn't appear to be an issue with the files themselves.

These files are videos. I'm happy to share a file with you to test, but the issue appears to be with celery and not with the file.

Last edited 5 years ago by ayleph (previous) (diff)

comment:3 Changed 5 years ago by Boris Bobrov

but the issue appears to be with celery and not with the file

Or with video transcoder ;) Please, send me a link to the file via PM in irc

there is nothing useful in my DEBUG log

Could you please upload the log anyway?

comment:4 in reply to:  3 Changed 5 years ago by ayleph

Replying to breton:

Please, send me a link to the file via PM in irc

/msg sent. Let me know if you don't get it.

Could you please upload the log anyway?

Seriously, there's nothing useful in the log around that timestamp. It's just a whole bunch of routing endpoints and failed captcha attempts. Nothing pertinent to this issue.

Oh...but maybe the timestamp is in my local time instead of server time. I'll look through the log with the timezone delta in mind. If I find anything at all related, I'll upload it.

comment:5 Changed 5 years ago by ayleph

I finally managed to catch this in the logs. Media entry 2060 is "stuck" in a processing state on my instance. The Media in-processing view shows ID 2060 at 58%.

Throughout the log below, media entry 2060 is assigned to Worker-1.

[2015-01-03 19:30:31,481: INFO/MainProcess] Task mediagoblin.processing.task.ProcessMedia[4e0d5d0d-20fd-4f42-b8b8-7e9616aaffda] succeeded in 345.10788407s: None
[2015-01-03 19:30:31,512: DEBUG/Worker-1] Processing <MediaEntry 2060: Islam, vrai associationisme !>
[2015-01-03 19:30:31,513: INFO/Worker-1] Initializing VideoTranscoder...
[2015-01-03 19:30:31,515: INFO/MainProcess] Received task: mediagoblin.processing.task.ProcessMedia[6fcfb72d-99a6-4ce8-8d7c-240625fc3d48]
[2015-01-03 19:30:31,518: INFO/Worker-1] Discovering /srv/mediagoblin/mediagoblin/user_dev/media/queue/media_entries/3be9336e-9e20-4de6-b11e-f80acf9a6164/Islam_vrai_associationisme__360p.mp4
[2015-01-03 19:30:31,648: INFO/Worker-1] Terminating MainLoop
[2015-01-03 19:30:31,657: INFO/Worker-1] Discovering...
[2015-01-03 19:30:31,659: INFO/Worker-1] Done
[2015-01-03 19:30:31,734: INFO/Worker-1] Transcoding...
[2015-01-03 19:30:32,018: INFO/Worker-1] 0% done...
[2015-01-03 19:30:33,034: INFO/Worker-2] 55% done...
[2015-01-03 19:30:35,009: INFO/Worker-2] 56% done...
[2015-01-03 19:30:38,031: INFO/Worker-2] 57% done...
[2015-01-03 19:30:39,012: INFO/Worker-1] 1% done...

...

[2015-01-03 19:32:30,021: INFO/Worker-2] 99% done...
[2015-01-03 19:32:30,042: INFO/Worker-1] 17% done...
[2015-01-03 19:32:31,889: INFO/Worker-2] 100% done...
[2015-01-03 19:32:31,935: INFO/Worker-2] Done
[2015-01-03 19:32:32,053: INFO/Worker-2] Terminating MainLoop
[2015-01-03 19:32:32,063: DEBUG/Worker-2] Saving medium...
[2015-01-03 19:32:32,918: DEBUG/Worker-2] Saved medium
[2015-01-03 19:32:34,018: INFO/Worker-2] playbin ready

...

[2015-01-03 19:32:34,098: INFO/Worker-2] Seek successful, attaching buffer probe
[2015-01-03 19:32:34,099: INFO/Worker-2] Attached buffer probes: {'fakesink31': 5568L}
[2015-01-03 19:32:34,116: INFO/Worker-2] Taking snapshot! ((<GstPad (fakesink31:sink) at 322c630>, <gst.Buffer 0x7ff390193490 of size 54540 and data 0x7a887a7c>, 'fakesink31'))
[2015-01-03 19:32:34,154: INFO/Worker-2] Saved snapshot!
[2015-01-03 19:32:34,180: DEBUG/Worker-2] Saving thumbnail...
[2015-01-03 19:32:35,025: INFO/Worker-1] 18% done...
[2015-01-03 19:32:35,121: INFO/MainProcess] Task mediagoblin.processing.task.ProcessMedia[afa94210-e127-4828-9b43-11bac2b15d65] succeeded in 237.901043467s: None
[2015-01-03 19:32:35,134: INFO/MainProcess] Received task: mediagoblin.processing.task.ProcessMedia[6249304d-8988-45d8-9144-cc36b32a9677]
[2015-01-03 19:32:35,139: DEBUG/Worker-2] Processing <MediaEntry 2061: Coran et mythologie.>
[2015-01-03 19:32:35,139: INFO/Worker-2] Initializing VideoTranscoder...
[2015-01-03 19:32:35,145: INFO/Worker-2] Discovering /srv/mediagoblin/mediagoblin/user_dev/media/queue/media_entries/6fcfb72d-99a6-4ce8-8d7c-240625fc3d48/Coran_et_mythologie._360p.mp4
[2015-01-03 19:32:35,236: INFO/Worker-2] Terminating MainLoop
[2015-01-03 19:32:35,244: INFO/Worker-2] Discovering...
[2015-01-03 19:32:35,247: INFO/Worker-2] Done
[2015-01-03 19:32:35,308: INFO/Worker-2] Transcoding...
[2015-01-03 19:32:36,006: INFO/Worker-2] 0% done...
[2015-01-03 19:32:37,078: CRITICAL/Worker-2] Reached gobject timeout
[2015-01-03 19:32:40,042: INFO/Worker-2] 1% done...
[2015-01-03 19:32:42,037: INFO/Worker-1] 19% done...

...

[2015-01-03 19:37:14,031: INFO/Worker-1] 57% done...
[2015-01-03 19:37:18,032: INFO/Worker-2] 63% done...
[2015-01-03 19:37:21,050: INFO/Worker-1] 58% done...
[2015-01-03 19:37:22,061: INFO/Worker-2] 64% done...
[2015-01-03 19:37:27,015: INFO/Worker-2] 65% done...
[2015-01-03 19:37:30,403: ERROR/MainProcess] Process 'Worker-1' pid:28046 exited with 'signal 9 (SIGKILL)'
[2015-01-03 19:37:30,545: INFO/MainProcess] Received task: mediagoblin.processing.task.ProcessMedia[0f05bbe8-45e9-447f-b51e-7fa318bb638e]
[2015-01-03 19:37:30,592: DEBUG/Worker-3] Processing <MediaEntry 2062: Coran, r?v?lation et v?rit? 1/2.>
[2015-01-03 19:37:30,647: INFO/Worker-3] Initializing VideoTranscoder...
[2015-01-03 19:37:30,661: INFO/Worker-3] Discovering /srv/mediagoblin/mediagoblin/user_dev/media/queue/media_entries/6249304d-8988-45d8-9144-cc36b32a9677/186c3a08-6282-46b3-8bc1-720c8efeec4d.mp4
[2015-01-03 19:37:30,794: INFO/Worker-3] Terminating MainLoop
[2015-01-03 19:37:30,831: INFO/Worker-3] Discovering...
[2015-01-03 19:37:30,845: INFO/Worker-3] Done
[2015-01-03 19:37:30,918: INFO/Worker-3] Transcoding...
[2015-01-03 19:37:31,001: INFO/Worker-3] 0% done...
[2015-01-03 19:37:34,022: INFO/Worker-2] 66% done...
[2015-01-03 19:37:34,063: INFO/Worker-3] 1% done...
[2015-01-03 19:37:35,036: INFO/Worker-3] 2% done...
[2015-01-03 19:37:37,025: INFO/Worker-3] 3% done...
[2015-01-03 19:37:38,030: INFO/Worker-2] 67% done... 
[2015-01-03 19:37:39,013: INFO/Worker-3] 5% done...
[2015-01-03 19:37:40,938: ERROR/MainProcess] Task mediagoblin.processing.task.ProcessMedia[3be9336e-9e20-4de6-b11e-f80acf9a6164] raised unexpected: WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL).',)
Traceback (most recent call last):
  File "/srv/mediagoblin/mediagoblin-0.7.2-dev/lib/python2.7/site-packages/billiard-3.3.0.18-py2.7-linux-x86_64.egg/billiard/pool.py", line 1170, in mark_as_worker_lost
    human_status(exitcode)),
WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL).
[2015-01-03 19:37:41,070: INFO/Worker-3] 6% done...

As the above logs show, Worker-1 reaches 58% done, and then around the time Worker-3 starts up, Worker-1 throws an error and terminates.

comment:6 Changed 5 years ago by ayleph

Based on some Google-ing of the billiard error, I found that some people have encountered this SIGKILL behaviour when their system is out of memory. Sure enough, I found the following in my system message log.

Jan  3 19:37:29 goblinrefuge kernel: paster invoked oom-killer: gfp_mask=0x10200da, order=0, oom_score_adj=0

I suppose I could upgrade to a VPS with more RAM ($$$). But it would be nice if GMG could provide some kind of error handling for this issue, maybe moving the media out of processing and into a failed state, instead of leaving it stuck in purgatory indefinitely.

Note: See TracTickets for help on using tickets.