Opened 9 years ago

Closed 5 years ago

#254 closed defect (fixed)

Directories in queue are not deleted automatically

Reported by: Aleksej Owned by:
Priority: trivial Milestone:
Component: programming Keywords: cloudfiles, bitesized
Cc: joar Parent Tickets:

Description (last modified by Christopher Allan Webber)

There are empty directories left in
user\_dev/media/queue/media\_entries/ after files are uploaded.



Subtickets

Change History (26)

comment:1 Changed 9 years ago by Christopher Allan Webber

I've had some thoughts on how to deal with this.

We want something along the lines of "rmdir directory", not "rm -rf
directory". We don't want to be at risk of deleting all
subdirectories if they have content in them. All we care about here
is wiping empty stuff.

So maybe along those lines we should have a method like
StorageInterface.clear\_empty\_dirs() that wipes all empty
subdirectories under a particular path. If a subdirectory has
things in it, it's simply skipped. StorageInterface implementations
that don't need this can just pass / return immediately (ie, this
applies to BasicFileStorage but not CloudFileStorage).

There's a small possibility of a race condition if you shoot too
low on the tree with clear\_empty\_dirs, so we should use this
sparingly just on particular things, like specifically the
subdirectory we know we allocated for this media entry's queued
media.

Does that make sense?



comment:2 Changed 9 years ago by Will Kahn-Greene

The original url for this bug was http://bugs.foocorp.net/issues/589 .

comment:3 Changed 8 years ago by spaetz

I have now a branch that fixes this issue. It is in my branch '565+254_delete_queue_directory' (on top of my branch for #565 but this can be applied independent of that one too).

It does 2 things:

1) Implement deletion of empty directories as well as regular files in the FileStorage backend

2) Delete the {task_id} directory that is being created for each submission and never cleaned up (hence this bug report). It explicitely deletes the uploaded file and the containing directory if its empty and will fail to recursively delete anything else (e.g. if by bad lack we happen to store our queued files in "/")

Last edited 8 years ago by spaetz (previous) (diff)

comment:4 Changed 8 years ago by spaetz

Keywords: review added
Milestone: 0.3.3

I just updated this branch to latest master. It is still on top of the branch for #565. but would be pretty easy to disentangle. Please review and merge :-)

comment:5 Changed 8 years ago by Christopher Allan Webber

Description: modified (diff)
Resolution: fixed
Status: acceptedclosed

Ah yeah that's fixed now!

Thanks! :)

comment:6 Changed 8 years ago by Christopher Allan Webber

Resolution: fixed
Status: closedreopened

Oh wait. I thought this was fixed; looks like it very well might not be.

There is another branch with commits I did not review:

9 unmerged commits in 565+254_delete_queue_directory (spaetz)
00e4aac * Convert media processing backends to delete the queue directory (#254)
0e40294 * Implement delete_dir in the FileStorage

The things I was thinking about being done were in the workbench. So this is not merged yet.

comment:7 Changed 8 years ago by Christopher Allan Webber

I've made a public branch of those two commits cherry-picked:

https://gitorious.org/~cwebber/mediagoblin/cwebbers-mediagoblin/commits/254_delete_queue_directories

Currently making sure it works. If so, I'll merge those also.

comment:8 Changed 8 years ago by Christopher Allan Webber

Okay, this looks really good. I think this is the right route; we could just run this after a successful processing.

I think we need to add support to this for cloudfiles.py before merging. The most complicated part of it would be the recursive delete? I'm not sure if there's a straightforward way to implement that in cloudfiles. I am guessing it probably would not be hard?

My main concern with doing a recursive delete with something like cloudfiles would be: how would it work? IIRC our implementation of cloudfiles' "paths" for something like:

['dir1', 'dir2', 'file.png']

Is to wrap them all up into something like:

'dir1/dir2/file.png'

... but there are no real "directories", that just accounts for the actual filename. So would implementing recursive deletion support require pulling down *all* filenames and doing a .startswith() check or something? That sounds like it would be awful...

I added a commit to my branch that removed the previous docstring to .delete_file() which was like:

     def delete_file(self, filepath):
         """
         Delete or dereference the file (not directory) at filepath.
 
         TODO: is the below comment correct? AFAIK, we won't clean up empty directories...
         This might need to delete directories, buckets, whatever, for
         cleanliness.  (Be sure to avoid race conditions on that though)
         """

The original reason for the above idea in the second part of the docstring (never implemented though) was that maybe you'd do something like:

queue_storage.delete_file(['media_to_process', 'queued_mediaentry', 'last_file_in_directory.png'], wipe_dir_if_empty=True])

And so since that was the last file, on a local file storage, it would delete this also (for storage systems like cloudfiles that don't even really have directories, it would be ignored). In retrospect this wasn't a good solution, and would be the same as running the new .delete_dir(dirpath, recursive=False) anyway.

So maybe we should keep .delete_dir() but maybe move off the recursive deletion support to another branch in case we find out another good way to handle this in the future? In the meanwhile, we could delete all files we "know" are there in all our processing methods, and then try calling .delete_dir() after all that is done.

How does this sound?

comment:9 Changed 8 years ago by Christopher Allan Webber

Milestone: 0.3.30.4.0

comment:10 Changed 8 years ago by Elrond

Owner: set to Elrond
Status: reopenedassigned

Okay, I'll relook into this in the coming days.

comment:11 Changed 8 years ago by Elrond

Cc: joar added

comment:12 Changed 8 years ago by Elrond

Keywords: review removed

Okay, I have merged cwebber's branch. (you can remove your branch, if you like to)

I have fixed some conflicts and also ported this over for the new helper functions.

Remaining issues

General

I don't know, if it is very pythonic for .delete_dir to return False for errors instead of raising an exception.

Also joar noted, that we really should have *all* paths in our db, so a recursive delete_dir is not needed.

Cloudfiles

.delete_dir is currently missing for cloudfiles. This is okay for now, because we don't expect queue storage to be on cloudfiles.
But in the long run, we should probably also call .delete_dir(media_entry_dir) after deleting a media_entry.

After talking to joar, it seems, that cloudfiles has no directories, just keys and files. The keys just happen to look like paths in our case foo/dir/file.txt. But there is no need to remove an empty directory. But deleting recursively means to pull down all keys and match the relevant ones. My proposed code:

  def delete_dir(self, path, recursive=False):
    assert recursive==False, "recursive delte not supported on cloudfiles"
    return True

comment:13 Changed 8 years ago by Christopher Allan Webber

It makes me a bit sad that people might call delete_dir(recursive=True) and expect files to go away, and cloudfiles users will have cruft forever! However, I don't know if there's a solution to this...

comment:14 Changed 8 years ago by Elrond

Keywords: cloudfiles added
Owner: Elrond deleted

One solution is to not have .delete_dir() do recursive at all. We really should have all our files in our db. So .delete_dir() is there to remove empty directories on storage that has directories.

At least my 2 bits.

Removing me from assigned, as I can't bring this ticket forward any more.

comment:15 Changed 7 years ago by Christopher Allan Webber

Status: assignedaccepted

comment:16 Changed 7 years ago by Christopher Allan Webber

Status: acceptedreview

I think this ticket needs review/decision making at this point.

comment:17 Changed 7 years ago by Christopher Allan Webber

Milestone: 0.4.00.4.1

Moving to 0.4.1

comment:18 Changed 7 years ago by Christopher Allan Webber

Milestone: 0.5.00.6.0

comment:19 Changed 7 years ago by Christopher Allan Webber

Milestone: 0.6.00.7.0

comment:20 Changed 7 years ago by Christopher Allan Webber

Keywords: bitesized added

Having re-looked at this, I think elrond is right, delete_dir() is just to delete empty directories, *if* they are empty.... and if the storage system has such a thing at all anyway. Maybe a better word would thus be delete_directory_if_empty() That would be fairly simple to code if someone was interested in taking it on.

I'm marking this bitesized, as such.

comment:21 Changed 6 years ago by Aleksej

Recently I did approximately this:

  • uploaded 2 files from different accounts;
  • deleted 1 of them;
  • uploaded another file, which failed (Ogg, the spectrogram couldn't be generated because of a missing library);
  • uploaded another Ogg file.

Now I see only 1 directory (with an Ogg file) in user_dev/media/queue/media_entries/, whose time is the same minute as the failure's time on the panel.

So it at least does NOT happen all the time / under all circumstances.

comment:22 Changed 6 years ago by Christopher Allan Webber

Status: reviewaccepted

comment:23 Changed 6 years ago by Christopher Allan Webber

Milestone: 0.7.00.8.0

comment:24 Changed 6 years ago by Elrond

Milestone: 0.8.0

I am removing the target milestone.

1) This one is not going to kill anyone immediately. It's not nice, but there's no reason to get this done specifically for 0.8
2) This one gets postponed all the time, no need to postpone from one version to the next.

Also:

I think, this ticket needs re-evalaution to find out, what the current situation is, what the problem is, and what we can do about it.

comment:25 Changed 5 years ago by Loic Dachary

It makes sense to me to just close this issue as resolved (in the sense that it made significant progress on multiple occasions. New issues can be created to address the problems discovered afterward, such as the naming of the functions etc. What do you think ?

comment:26 Changed 5 years ago by Christopher Allan Webber

Resolution: fixed
Status: acceptedclosed

I think that makes sense, loic. If someone is still having troubles, they should open a new bug with a clean description.

Note: See TracTickets for help on using tickets.