Opened 11 years ago

Last modified 11 years ago

#650 new defect

.media_data is sometimes None (and has no rows)! Is this the right thing to do?

Reported by: Christopher Allan Webber Owned by:
Priority: major Milestone:
Component: documentation Keywords:
Cc: Elrond Parent Tickets:


Elrond and I had a conversation on IRC. Basically, I was surprised to learn that some entries do not have .media_data records and wrote some code that exploded on me, and that this was by design (it may be that I was involved in this design, but anyway, if so I forgot :)) The way things currently work is that media_data is optionally initialized when properties are set, like so:

    if len(exif_all):

    if len(gps_data):
        for key in list(gps_data.keys()):
            gps_data['gps_' + key] = gps_data.pop(key)

In the above example (which is image processing code) media data is initialized if there's exif data or if there's GPS data. However, if there's neither, the .media_data row will never be written and trying to access .media_data will return None.

The question is: is this a good idea? The short version is that it's a battle between two different forms of minimalism: only filling the database with rows when there's something to store results in a leaner database but messier code, always having a row results in simpler code but some rows when no rows need to exist. (Thus a battle between Elrond's razor (database simplicity) and cwebber's razor (code simplicity)!)

A longer version of these points are:

  1. This results in code that is harder to read:
if media.media_data is not None:
    # do something with media data

... which is fairly confusing.

  1. As for writing, one could argue that it's not much trickier to do media.media_data.p = q or instead media.media_data_init(p=q). (Counterpoint: maybe there are some media data forms that require setting multiple fields at once in a somewhat interdependent or tricky way?) However, initializing fields is not the only thing; one also works with things above such as the if media.media_data is not None which can be very tricky to get right. Also, if Chris Webber was confused and wrote totally broken code because of this .media_data being None thing, probably so will others!
  2. Migrations could get a lot messier. Basically, one might write a migration that assumes a query on an images' media data, and one might forget that those rows might not always be there. This could result in broken migrations. On the other hand, migrations are already one of the trickest things in mediagoblin anyway. ;)
  3. The big thing though is that it's less db space to do things this way. Elrond says it's a bigger concern on MySQL/MariaDB (well, we don't support that yet anyway ;)) than it is on postgres or sqlite.
  4. It wouldn't be less queries though, since we still need to check whether there's a row or not. (However, Elrond points out that the payload is somewhat less if we don't return anything!)

What's the right thing to do?

Change History (7)

comment:1 by Elrond, 11 years ago

.media_data was tricky from the start on. So we could let it return some more tricky stuff to make the rest of the code more "cwebber's razor" friendly.

One proof of concept idea in this direction is in elrond/hack/ro_media_data.

This is only to say "There might be other ways to resolve this".

comment:2 by Christopher Allan Webber, 11 years ago

Hm, it's an interesting idea, but I think it doesn't solve (all) the problems:

  • Different columns might have special representation (eg we're talking about JSONEncoded to default to give back an empty dict even when None)
  • Having access to the methods would be really good; I've run into how this can be annoying to *not* have with the recent work I did on the video_entry.media_data.source_type() method and fallback.
  • It doesn't help with the migrations being tricky aspect.

I'm hesitant to put this in place... it might make things even more confusing.

comment:3 by Elrond, 11 years ago

  • I don't know about the JSONEncoded thing. I would guess, it's already in the pythonic form in media_data.__dict__, but I don't know.
  • methods are a known shortcoming of this *idea*. Maybe this can be fixed.
  • We already agreed, that migrations are their own problem. We probably should keep them in mind, but leave them aside for the rest of the discussion.

As already pointed out, my branch is named hack/ for good reasons. I did never intend to get this merged. It was on the lines of "python lets us do a lot of very tricky things, we might consider to use some of them or decide against any of them".

Two things:

  1. We should probably keep this thing very low prio until we have considered on how to improve the media manager. For example, .source_type() could have been a method on a VideoManager class. That would have made the whole thing lots easier.
  2. Whatever we end up with, we should document it better than last time. That would have helped this time somewhat, because we could have looked at the docs and say "Oh, we have decided for Foo's Razor for this and that reason".

comment:4 by Christopher Allan Webber, 11 years ago

Hey Elrond,

I certainly agree with your points 1 and 2 above! I guess we can circle back to this when we know a bit better about what's happening with the media managers.

comment:5 by Christopher Allan Webber, 11 years ago

As a minor update, we have the media managers now. However it's not well documented and I'm not sure that things are clearer. I've still run into these kinds of errors in various places, though less so.

comment:6 by Christopher Allan Webber, 11 years ago

So, we could possibly close this without changing the way thigns work. But a prerequisite for doing so would be to fully document the way they do in the programmer guide, and that's both for the media manager, and the potential traps one might run into coding around .media_data possibly being None.

comment:7 by Christopher Allan Webber, 11 years ago

Component: programmingdocumentation
Note: See TracTickets for help on using tickets.