Opened 15 years ago

Last modified 10 years ago

#162 accepted enhancement

add quoting support to tagging

Reported by: Caleb Davis Owned by:
Priority: minor Milestone:
Component: programming Keywords:
Cc: Parent Tickets:

Description (last modified by Christopher Allan Webber)

Tagging (#360) needs to be iterated on. This ticket addresses support for quoting. What are the expected behaviors? In the following examples, user input is underlined, and resulting tags are indented.

  • In the simple case (assuming comma-delimited text), the user has a single pair of quotes embedded within the input, such as: meta,'yo dawg, I heard you like delimiters...',humor

    u'meta'

    u'yo dawg, I heard you like delimiters...'

    u'humor'

  • what about mixed quotes? (eg user wants you're in the tag) meme,"I don't even...",humor

    u'meme'

    u'I don't even'

    u'humor'

  • then, as Will noted in IRC, some users may wish to tag using xml or hierarchical organization schemes. At what point does supporting quoting become complex enough to warrant using a plugin?

tl;dr - introduce support for tag delimiter(s) within a tag. Efforts towards native gmg support go here. Add a new ticket for plugin work

Feel fix this ticket if I'm missing the point or the point is not clear.

Change History (9)

comment:1 by Caleb Davis, 15 years ago

oops, *"Feel fix this..." = "Feel free to fix this..."

comment:2 by Aaron Williamson, 14 years ago

Owner: set to Elvenlord Elrond
Status: NewFeedback

Updated convert_to_tag_list_of_dicts to use shlex to parse tag strings. Now a tag with quotes around it will be treated as one tag, even if it contains commas. Branch: [https://gitorious.org/copiesofcopies/mediagoblin/copiesofcopies-mediagoblin/commits/feature454_quoted_tags](https://gitorious.org/copiesofcopies/mediagoblin/copiesofcopies-mediagoblin/commits/feature454_quoted_tags)

comment:3 by Aaron Williamson, 14 years ago

Also: this patch supports single quotes/apostrophes in tags wrapped in double-quotes, but I'm not sure how to write the test for this.

comment:4 by Elrond, 14 years ago

Owner: changed from Elvenlord Elrond to Aaron Williamson
Status: FeedbackIn Progress

Already looks quite good!

I think media_tags_as_string() does not yet handle this properly. This especially affects the "edit media" page.

comment:5 by Will Kahn-Greene, 14 years ago

The original url for this bug was http://bugs.foocorp.net/issues/454 .
Relations:
#75: blocked

comment:6 by Christopher Allan Webber, 13 years ago

Description: modified (diff)
Owner: Aaron Williamson removed

This ticket hasn't been worked on for some while so I'm removing the claim!

comment:7 by Ben Sturmfels, 10 years ago

Type: defectenhancement

For what it's worth, MediaGoblin does work exactly as the form suggests: "Separate tags by commas". I'm not entirely convinced that allowing quotes is a helpful thing to do for the majority of people - it may just complicate things.

Regardless whether we implement the feature or not, I think it would be necessary to use the csv module to deserialize/serialize. The shlex module doesn't allow us to go both ways reliably.

comment:8 by Ben Sturmfels, 10 years ago

Aaron, even though my feeling is that shlex might not be the eventual solution here, I'd still be interested to see your code if you have a copy. Unfortunately the Gitorious repository referenced is no longer available.

comment:9 by Ben Sturmfels, 10 years ago

I've made a quick initial attempt at adding this feature using the csv module to serialize/deserialize the tags. Unfortunately that isn't going to work because of one tiny detail: csv.writer can only output with a single character delimiter, eg. , (comma) and not , (comma + space). Without this, our serialised tags will look like yin,yang, rather than yin, yang - which I don't think we want. The csv module is also written in C, so there's no trivial fix.

This feature is actually quite complex because:

  1. CSV serializing/deserializing requires a real parser to handle quoting correctly (so we'll have to re-implement csv)
  1. We're using the serialized version as user interace:
    • Our parser must input/output Unicode text, not UTF-8 encoded bytes as csv does (noting that Python 3 csv requires Unicode input)
    • Our parser must not require whitespace after comma, but should also collapse extra whitespace around and within tags when deserializing

Based on the amount of work involved in implementing this feature and my assumption that it won't be heavily used, I'd suggest we keep things simple and don't build this. I'll keep this open for a few weeks to see if anyone has other thoughts.

Note: See TracTickets for help on using tickets.