Opened 12 years ago

Closed 12 years ago

#672 closed enhancement (fixed)

rfe: pdf media type support

Reported by: alon Owned by: alon
Priority: major Milestone: 0.4.0
Component: programming Keywords: review
Cc: Parent Tickets:

Description

Supporting PDF files would let mediagoblin have many more use cases, such as a collection of datasheets, research papers, and books.

A working pdf media type is at:

https://www.gitorious.org/~alon/mediagoblin/alons-mediagoblin/commits/pdf

The test is not in there because I wasn't sure how it should be added - I have it written but the problems are:

  1. it needs to run only if pdftocairo is available

So I can add some code to check for the existence of that executable and only then enable the media type and test.

Change History (11)

comment:1 by alon, 12 years ago

This is about the pdf type, similar to http://issues.mediagoblin.org/ticket/391 which is about using pdf.js as well. Not closing as a duplicate because this one actually has a different reviewable tree on it.

comment:2 by alon, 12 years ago

I've updated the tree, there are a few wip commits there for pdf.js support, so ignore those, the latest patch I actually would like reviewed is:

https://www.gitorious.org/~alon/mediagoblin/alons-mediagoblin/commit/81058c2b8b2ab5eead612726a56e083f7ea37d80

It now includes tests, has a populated model. No change to the display though - it consists of a prerendered first page, clicking or pressing the "original" link will get whatever mime based action the browser has in store, pdf.js for firefox>=19, download for midori, etc.

comment:3 by alon, 12 years ago

New version:

  1. Addressed Elrond's comments, using the new store_public
  2. Fixed tests to use the new py.test way
  3. made pdf.js optional, default off since it still looks bad (mediagoblin.ini updated with commented out section)
  4. inserted the whole list of supported extensions for unoconv, except the image file formats that we support with other plugins. *not* extensively tested.

Known problems:

  1. If you use a non default theme pdf displays as the default theme.
  2. sometimes unoconv doesn't work on the first try. This should not produce the ascii goblin, just a failure, but no indication that a resubmission will work (it should since a soffice.bin will stay running after the first submission)

Please review,

https://www.gitorious.org/~alon/mediagoblin/alons-mediagoblin/commit/b3c4d88e8795d9bcd7698615c27e30660b1e71a8

comment:4 by Christopher Allan Webber, 12 years ago

  1. So this seems weird:
-<html>
+<html
+{% block mediagoblin_html_tag %}
+{% endblock mediagoblin_html_tag %}
+>

Could that be done altering the HTML DOM after the fact since it's
js-only anyway?
I'm not sure I'm against it..

  1. holy cow, lots of .js files in:
+    <!--#if !PRODUCTION-->
  1. Wow, awesome, lots of extensions supported in:
unoconv_supported = [

I gues that's not an actionable item, I'm just impressed!

  1. This should be the format of: [media_type:mediagoblin.media_types.pdf] in the config spec, and probably doesn't belong in mediagoblin.ini:
[pdf]
## Uncomment this to use pdf_js - not there yet, enable by default when fixed.
#pdf_js = true

  1. for avoidance of injection attacks, please use subprocess.popen for stuff like:
  cmd = 'pdftocairo -scale-to %d -singlefile -png %s %s' % (
  # and
  lines = os.popen('pdfinfo %s' % original).readlines()

... you wouldn't want someone to submit a ; rm -rf /; bobbytables.pdf filename :)

  1. code in pdf_info could use some spacing, kinda condesnsed looking right now
  2. doesn't look like the file is being saved to the workbench during processing? If you aren't familiar with the workbench, you should consider reading: http://wiki.mediagoblin.org/Storage ... that said, plenty of people seem to get confused about the workbench, so I'm happy to further explain. You should think of it as a place that's set up for temporary files and files-being-processed during execution, and it disappears at the end of processing.

Actually looking at this futher I might be confused. Is it the case that it's saving to the same directory as the workbench since stuff is copied locally already? I'm not sure from looking at it. :)

  1. 2a6a4271d0c3ca0e7be406486eb903c4b97c9857 definitely needs to be reverted; this isn't really general mediagoblin code (and i suspect there must be a way to improve the situation on your server somehow..)
  2. Definitely agree that pdf.js should be moved to a git submodule.

comment:5 by Christopher Allan Webber, 12 years ago

Oh and last comment:

This is awesome! Thank you for doing it! :)

comment:6 by alon, 12 years ago

  1. Can you actually change the top level (singleton) html element via js? that would seem weird. I mean sure, I guess adding attributes.
  1. Fixed - my misunderstanding of how that file worked. It was actually a template of a js file that was processed via make.js, so those if's were for it. I did the simple thing of processing it by hand. Not sure it's the right thing for later - actually I take it back, I think it's the right thing for this version, and we will see how fast pdf.js web/viewer.html evolves or how easy it is to incorporate changes later.
  1. OK, removing - I'll document it in the mediafiles docs instead, so people know to enable it (if they want, or just for development).
  1. Fixed
  1. Fixed.
  1. added some spaces, hope to your liking :) But is there actually any coding standard here or did you just not like the length of the function?
  1. I believe I'm doing the right thing:
  2. get a local file name via proc_state.get_queued_filename
  3. run some local processes to produce a new file
  4. store it via proc_state.store_public
  1. This is a misunderstanding - I specifically gave the name of the commit to merge from, the other commits are on *top* of it to avoid pulling them. So we are in agreement here I think.
  1. Done already.

The new tag is: git://gitorious.org/~alon/mediagoblin/alons-mediagoblin.git pdf.v2

It contains two commits:
document submodule usage - short documentation only change
the pdf patch

Version 0, edited 12 years ago by alon (next)

comment:7 by alon, 12 years ago

So actually I didn't fix #1, just noticed - if you want me to I can give it a shot, but I think even though it's a bit ugly it makes more sense to change it at the server then at the client.

comment:8 by Christopher Allan Webber, 12 years ago

This looks really good. I think it's ready for merge, even though I'd like to do the "embedding" on the page via an iframe, as we discussed. But before I do, what's with the .svg/.png files checked into static? theoretically that's for pdf.js? Can't se just pull that out of the submodule?

comment:9 by alon, 12 years ago

hmm, I think you're right, can't remember why I did that initially, fixed it now.

New commit is: a80ebf3
Still pdf branch, no longer a tag: git://gitorious.org/~alon/mediagoblin/alons-mediagoblin.git

comment:10 by Christopher Allan Webber, 12 years ago

Milestone: 0.4.0

Awesome. I merged the branch!

However, I think we shouldn't close this until we try out the iframe thing, so leaving it open for now!

comment:11 by alon, 12 years ago

Resolution: fixed
Status: newclosed

iframe committed: bbd7112a5eb74d6f906d567e827f399ac6b5f022

closing ticket.

Note: See TracTickets for help on using tickets.