Opened 9 years ago

Closed 8 years ago

Last modified 8 years ago

#5342 closed defect (fixed)

Bad feed media URLs for non-local storage

Reported by: ayleph Owned by:
Priority: major Milestone: 0.9.0
Component: programming Keywords: feed, api, url, cloudfiles
Cc: tsyesika Parent Tickets:

Description

When I connect to goblinrefuge.com using a pump client, none of the media is displayed because the links are bad. goblinrefuge.com stores media on CloudFiles, so files are hosted on a separate domain. The feed media URL doesn't get this correct.

Here's an example of a feed media URL Pumpa received from the API.

https://pump.goblinrefuge.com/ttps://b2aeaa58a57a200320db-8b65b95250e902c437b256b5abf3eac7.ssl.cf5.rackcdn.com/media_entries/5223/tmp_5538-IMG_20150625_031717-1392399859.jpg

And here's the actual media URL.

https://b2aeaa58a57a200320db-8b65b95250e902c437b256b5abf3eac7.ssl.cf5.rackcdn.com/media_entries/5223/tmp_5538-IMG_20150625_031717-1392399859.jpg

The API is mangling the URL by trying to append it to the URL of the mediagoblin server as though it's a relative path hosted on the same domain, instead of using the actual absolute URL.

Attachments (1)

0001-Use-urljoin-to-create-proper-feed-media-URLs.patch (1.3 KB ) - added by ayleph 8 years ago.

Download all attachments as: .zip

Change History (8)

comment:1 by ayleph, 8 years ago

After a lot of investigation, I've figured out why this is happening. When the web frontend of MediaGoblin wants a link to a thumbnail or full-size image, it queries the storage object to get the correct location. In the case of a remote file, the storage object returns the entire absolute path (eg, https://uuid.rackcdn.com/image.png). In the case of a local file, the storage object can simply return a partial URL (eg, /mgoblin_media/media_entries/n/image.png), and the browser will automatically treat that as a local URL on the current host.

However, an API client won't treat local URIs like a browser will. To get around this, the API prepends the local host to all URIs returned by the storage object, assuming that all links reside on the API host.

mediagoblin/db/models.py:

 512 class MediaEntry(Base, MediaEntryMixin, CommentingMixin):
 ...
 733     def serialize(self, request, show_comments=True):
 ...
 739         context = {
 ...
 743             "url": self.url_for_self(request.urlgen, qualified=True),
 744             "image": {
 745                 "url": request.host_url + self.thumb_url[1:],
 746             },
 747             "fullImage":{
 748                 "url": request.host_url + self.original_url[1:],
 749             },
 ...

When the storage object returns an external file, this breaks. The values of "url" and "fullImage" become something like https://pump.goblinrefuge.com/ttps://uuid.rackcdn.com/image.png.

I can get around this by assuming all files are hosted externally with the modification below.

diff --git a/mediagoblin/db/models.py b/mediagoblin/db/models.py
index 0de14b4..9932dc4 100644
--- a/mediagoblin/db/models.py
+++ b/mediagoblin/db/models.py
@@ -742,10 +742,10 @@ class MediaEntry(Base, MediaEntryMixin, CommentingMixin):
             "objectType": self.object_type,
             "url": self.url_for_self(request.urlgen, qualified=True),
             "image": {
-                "url": request.host_url + self.thumb_url[1:],
+                "url":self.thumb_url,
             },
             "fullImage":{
-                "url": request.host_url + self.original_url[1:],
+                "url": self.original_url,
             },
             "published": published.isoformat(),
             "updated": updated.isoformat(),

However, this change breaks the "url" and "fullImage" links for instances with local storage. It looks like we may need to do some case checking here to see if local storage is enabled or whether the URI returned by the storage object includes the FQDN host.

comment:2 by ayleph, 8 years ago

Status: newreview

Here's a proposed solution to this issue. This isn't an extremely flexible solution as it assumes any file storage other than BasicFileStorage will store the complete URI in the database. This solution would also fail to work if someone in the future implemented a dual-storage plugin which could store media locally and remotely. A separate solution might be to perform regex checking on self.thumb_url and self.original_url to determine if those links provide their own host information.

From 8cf32e755ad942433200cea104d168482fc56f4e Mon Sep 17 00:00:00 2001
From: ayleph <ayleph@thisshitistemp.com>
Date: Mon, 2 Nov 2015 01:04:09 -0500
Subject: [PATCH] Change API feed URL based on storage location

The default serialized media and thumb URLs for API clients don't
work for remote file stores. This update queries the instance
configuration to determine how to preset the feed URLs. For the default
BasicFileStorage method, identical URLs are returned as in the previous
code. For all other storage methods, the URI is taken directly from the 
database without modification.
---
 mediagoblin/db/models.py | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/mediagoblin/db/models.py b/mediagoblin/db/models.py
index 0de14b4..f757669 100644
--- a/mediagoblin/db/models.py
+++ b/mediagoblin/db/models.py
@@ -34,6 +34,7 @@ from sqlalchemy.sql.expression import desc
 from sqlalchemy.ext.associationproxy import association_proxy
 from sqlalchemy.util import memoized_property
 
+from mediagoblin import mg_globals as mgg
 from mediagoblin.db.extratypes import (PathTupleWithSlashes, JSONEncoded,
                                        MutationDict)
 from mediagoblin.db.base import Base, DictReadAttrProxy, FakeCursor
@@ -736,16 +737,22 @@ class MediaEntry(Base, MediaEntryMixin, CommentingMixin):
         published = UTC.localize(self.created)
         updated = UTC.localize(self.updated)
         public_id = self.get_public_id(request.urlgen)
+
+        url_header = ''
+        storage_class = mgg.global_config['storage:publicstore']['storage_class']
+        if storage_class == 'mediagoblin.storage.filestorage:BasicFileStorage':
+            url_header = request.host_url[:-1]
+
         context = { 
             "id": public_id,
             "author": author.serialize(request),
             "objectType": self.object_type,
             "url": self.url_for_self(request.urlgen, qualified=True),
             "image": {
-                "url": request.host_url + self.thumb_url[1:],
+                "url": url_header + self.thumb_url,
             },  
             "fullImage":{
-                "url": request.host_url + self.original_url[1:],
+                "url": url_header + self.original_url,
             },  
             "published": published.isoformat(),
             "updated": updated.isoformat(),
-- 
2.6.2

comment:3 by ayleph, 8 years ago

Owner: set to ayleph
Status: reviewin_progress

Testing on my development instance saw some issues with deleted media. I'm going to take a little more time to play with this one.

comment:4 by ayleph, 8 years ago

Owner: ayleph removed
Status: in_progressreview

Proper fix attached. This uses urljoin to automagically present the correct URL.

comment:5 by ayleph, 8 years ago

Milestone: 0.8.1

comment:6 by Christopher Allan Webber, 8 years ago

Resolution: fixed
Status: reviewclosed

Good catch. Yes, we should always be using urljoin in circumstances like these.

Committed and pushed!

comment:7 by Christopher Allan Webber, 8 years ago

Milestone: 0.8.20.9.0

All 0.8.2 tickets are being rolled over to 0.9.0

Note: See TracTickets for help on using tickets.