Opened 8 years ago

Last modified 3 years ago

#162 accepted enhancement

add quoting support to tagging

Reported by: Caleb Davis Owned by:
Priority: minor Milestone:
Component: programming Keywords:
Cc: Parent Tickets:

Description (last modified by Christopher Allan Webber)

Tagging (`#360 </issues/360>`_) needs to be iterated on. This
ticket addresses support for quoting. What are the expected
behaviors? In the following examples, user input is underlined, and
resulting tags are indented.


-  In the simple case (assuming comma-delimited text), the user has
   a single pair of quotes embedded within the input, such as:
   meta,'yo dawg, I heard you like delimiters...',humor

    u'meta'


    u'yo dawg, I heard you like delimiters...'


    u'humor'



-  what about mixed quotes? (eg user wants *you're* in the tag)
   meme,"I don't even...",humor

    u'meme'


    u'I don't even'


    u'humor'



-  then, as Will noted in IRC, some users may wish to tag using xml
   or hierarchical organization schemes. At what point does supporting
   quoting become complex enough to warrant using a plugin?

**tl;dr - introduce support for tag delimiter(s) within a tag. Efforts towards native gmg support go here. Add a new ticket for plugin work**

Feel fix this ticket if I'm missing the point or the point is not
clear.



Subtickets

Change History (9)

comment:1 Changed 8 years ago by Caleb Davis

oops, \*"Feel fix this..." = "Feel *free to* fix this..."



comment:2 Changed 8 years ago by Aaron Williamson

Owner: set to Elvenlord Elrond
Status: NewFeedback
Updated convert\_to\_tag\_list\_of\_dicts to use shlex to parse tag
strings. Now a tag with quotes around it will be treated as one
tag, even if it contains commas. Branch:
[https://gitorious.org/\ :sub:`copiesofcopies/mediagoblin/copiesofcopies-mediagoblin/commits/feature454\_quoted\_tags](https://gitorious.org/`\ copiesofcopies/mediagoblin/copiesofcopies-mediagoblin/commits/feature454\_quoted\_tags)



comment:3 Changed 8 years ago by Aaron Williamson

Also: this patch supports single quotes/apostrophes in tags wrapped
in double-quotes, but I'm not sure how to write the test for this.



comment:4 Changed 8 years ago by Elrond

Owner: changed from Elvenlord Elrond to Aaron Williamson
Status: FeedbackIn Progress
Already looks quite good!

I think ``media_tags_as_string()`` does not yet handle this
properly. This especially affects the "edit media" page.



comment:5 Changed 8 years ago by Will Kahn-Greene

The original url for this bug was http://bugs.foocorp.net/issues/454 .
Relations:
#75: blocked

comment:6 Changed 7 years ago by Christopher Allan Webber

Description: modified (diff)
Owner: Aaron Williamson deleted

This ticket hasn't been worked on for some while so I'm removing the claim!

comment:7 Changed 3 years ago by Ben Sturmfels

Type: defectenhancement

For what it's worth, MediaGoblin does work exactly as the form suggests: "Separate tags by commas". I'm not entirely convinced that allowing quotes is a helpful thing to do for the majority of people - it may just complicate things.

Regardless whether we implement the feature or not, I think it would be necessary to use the csv module to deserialize/serialize. The shlex module doesn't allow us to go both ways reliably.

comment:8 Changed 3 years ago by Ben Sturmfels

Aaron, even though my feeling is that shlex might not be the eventual solution here, I'd still be interested to see your code if you have a copy. Unfortunately the Gitorious repository referenced is no longer available.

comment:9 Changed 3 years ago by Ben Sturmfels

I've made a quick initial attempt at adding this feature using the csv module to serialize/deserialize the tags. Unfortunately that isn't going to work because of one tiny detail: csv.writer can only output with a single character delimiter, eg. , (comma) and not , (comma + space). Without this, our serialised tags will look like yin,yang, rather than yin, yang - which I don't think we want. The csv module is also written in C, so there's no trivial fix.

This feature is actually quite complex because:

  1. CSV serializing/deserializing requires a real parser to handle quoting correctly (so we'll have to re-implement csv)
  1. We're using the serialized version as user interace:
    • Our parser must input/output Unicode text, not UTF-8 encoded bytes as csv does (noting that Python 3 csv requires Unicode input)
    • Our parser must not require whitespace after comma, but should also collapse extra whitespace around and within tags when deserializing

Based on the amount of work involved in implementing this feature and my assumption that it won't be heavily used, I'd suggest we keep things simple and don't build this. I'll keep this open for a few weeks to see if anyone has other thoughts.

Note: See TracTickets for help on using tickets.