Opened 13 years ago
Closed 9 years ago
#254 closed defect (fixed)
Directories in queue are not deleted automatically
Reported by: | Aleksej | Owned by: | |
---|---|---|---|
Priority: | trivial | Milestone: | |
Component: | programming | Keywords: | cloudfiles, bitesized |
Cc: | joar | Parent Tickets: |
Description (last modified by )
There are empty directories left in user\_dev/media/queue/media\_entries/ after files are uploaded.
Change History (26)
comment:3 by , 12 years ago
I have now a branch that fixes this issue. It is in my branch '565+254_delete_queue_directory' (on top of my branch for #565 but this can be applied independent of that one too).
It does 2 things:
1) Implement deletion of empty directories as well as regular files in the FileStorage backend
2) Delete the {task_id} directory that is being created for each submission and never cleaned up (hence this bug report). It explicitely deletes the uploaded file and the containing directory if its empty and will fail to recursively delete anything else (e.g. if by bad lack we happen to store our queued files in "/")
comment:4 by , 12 years ago
Keywords: | review added |
---|---|
Milestone: | → 0.3.3 |
I just updated this branch to latest master. It is still on top of the branch for #565. but would be pretty easy to disentangle. Please review and merge :-)
comment:5 by , 12 years ago
Description: | modified (diff) |
---|---|
Resolution: | → fixed |
Status: | accepted → closed |
Ah yeah that's fixed now!
Thanks! :)
comment:6 by , 12 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
Oh wait. I thought this was fixed; looks like it very well might not be.
There is another branch with commits I did not review:
9 unmerged commits in 565+254_delete_queue_directory (spaetz) 00e4aac * Convert media processing backends to delete the queue directory (#254) 0e40294 * Implement delete_dir in the FileStorage
The things I was thinking about being done were in the workbench. So this is not merged yet.
comment:7 by , 12 years ago
I've made a public branch of those two commits cherry-picked:
Currently making sure it works. If so, I'll merge those also.
comment:8 by , 12 years ago
Okay, this looks really good. I think this is the right route; we could just run this after a successful processing.
I think we need to add support to this for cloudfiles.py before merging. The most complicated part of it would be the recursive delete? I'm not sure if there's a straightforward way to implement that in cloudfiles. I am guessing it probably would not be hard?
My main concern with doing a recursive delete with something like cloudfiles would be: how would it work? IIRC our implementation of cloudfiles' "paths" for something like:
['dir1', 'dir2', 'file.png']
Is to wrap them all up into something like:
'dir1/dir2/file.png'
... but there are no real "directories", that just accounts for the actual filename. So would implementing recursive deletion support require pulling down *all* filenames and doing a .startswith() check or something? That sounds like it would be awful...
I added a commit to my branch that removed the previous docstring to .delete_file() which was like:
def delete_file(self, filepath): """ Delete or dereference the file (not directory) at filepath. TODO: is the below comment correct? AFAIK, we won't clean up empty directories... This might need to delete directories, buckets, whatever, for cleanliness. (Be sure to avoid race conditions on that though) """
The original reason for the above idea in the second part of the docstring (never implemented though) was that maybe you'd do something like:
queue_storage.delete_file(['media_to_process', 'queued_mediaentry', 'last_file_in_directory.png'], wipe_dir_if_empty=True])
And so since that was the last file, on a local file storage, it would delete this also (for storage systems like cloudfiles that don't even really have directories, it would be ignored). In retrospect this wasn't a good solution, and would be the same as running the new .delete_dir(dirpath, recursive=False) anyway.
So maybe we should keep .delete_dir() but maybe move off the recursive deletion support to another branch in case we find out another good way to handle this in the future? In the meanwhile, we could delete all files we "know" are there in all our processing methods, and then try calling .delete_dir() after all that is done.
How does this sound?
comment:9 by , 12 years ago
Milestone: | 0.3.3 → 0.4.0 |
---|
comment:10 by , 12 years ago
Owner: | set to |
---|---|
Status: | reopened → assigned |
Okay, I'll relook into this in the coming days.
comment:11 by , 12 years ago
Cc: | added |
---|
comment:12 by , 12 years ago
Keywords: | review removed |
---|
Okay, I have merged cwebber's branch. (you can remove your branch, if you like to)
I have fixed some conflicts and also ported this over for the new helper functions.
Remaining issues
General
I don't know, if it is very pythonic for .delete_dir
to return False for errors instead of raising an exception.
Also joar noted, that we really should have *all* paths in our db, so a recursive delete_dir is not needed.
Cloudfiles
.delete_dir
is currently missing for cloudfiles. This is okay for now, because we don't expect queue storage to be on cloudfiles.
But in the long run, we should probably also call .delete_dir(media_entry_dir)
after deleting a media_entry.
After talking to joar, it seems, that cloudfiles has no directories, just keys and files. The keys just happen to look like paths in our case foo/dir/file.txt
. But there is no need to remove an empty directory. But deleting recursively means to pull down all keys and match the relevant ones. My proposed code:
def delete_dir(self, path, recursive=False): assert recursive==False, "recursive delte not supported on cloudfiles" return True
comment:13 by , 12 years ago
It makes me a bit sad that people might call delete_dir(recursive=True) and expect files to go away, and cloudfiles users will have cruft forever! However, I don't know if there's a solution to this...
comment:14 by , 12 years ago
Keywords: | cloudfiles added |
---|---|
Owner: | removed |
One solution is to not have .delete_dir()
do recursive at all. We really should have all our files in our db. So .delete_dir()
is there to remove empty directories on storage that has directories.
At least my 2 bits.
Removing me from assigned, as I can't bring this ticket forward any more.
comment:15 by , 12 years ago
Status: | assigned → accepted |
---|
comment:16 by , 12 years ago
Status: | accepted → review |
---|
I think this ticket needs review/decision making at this point.
comment:18 by , 11 years ago
Milestone: | 0.5.0 → 0.6.0 |
---|
comment:19 by , 11 years ago
Milestone: | 0.6.0 → 0.7.0 |
---|
comment:20 by , 11 years ago
Keywords: | bitesized added |
---|
Having re-looked at this, I think elrond is right, delete_dir()
is just to delete empty directories, *if* they are empty.... and if the storage system has such a thing at all anyway. Maybe a better word would thus be delete_directory_if_empty()
That would be fairly simple to code if someone was interested in taking it on.
I'm marking this bitesized, as such.
comment:21 by , 11 years ago
Recently I did approximately this:
- uploaded 2 files from different accounts;
- deleted 1 of them;
- uploaded another file, which failed (Ogg, the spectrogram couldn't be generated because of a missing library);
- uploaded another Ogg file.
Now I see only 1 directory (with an Ogg file) in user_dev/media/queue/media_entries/, whose time is the same minute as the failure's time on the panel.
So it at least does NOT happen all the time / under all circumstances.
comment:22 by , 11 years ago
Status: | review → accepted |
---|
comment:23 by , 10 years ago
Milestone: | 0.7.0 → 0.8.0 |
---|
comment:24 by , 10 years ago
Milestone: | 0.8.0 |
---|
I am removing the target milestone.
1) This one is not going to kill anyone immediately. It's not nice, but there's no reason to get this done specifically for 0.8
2) This one gets postponed all the time, no need to postpone from one version to the next.
Also:
I think, this ticket needs re-evalaution to find out, what the current situation is, what the problem is, and what we can do about it.
comment:25 by , 9 years ago
It makes sense to me to just close this issue as resolved (in the sense that it made significant progress on multiple occasions. New issues can be created to address the problems discovered afterward, such as the naming of the functions etc. What do you think ?
comment:26 by , 9 years ago
Resolution: | → fixed |
---|---|
Status: | accepted → closed |
I think that makes sense, loic. If someone is still having troubles, they should open a new bug with a clean description.