Opened 13 years ago
Last modified 8 years ago
#162 accepted enhancement
add quoting support to tagging
Reported by: | Caleb Davis | Owned by: | |
---|---|---|---|
Priority: | minor | Milestone: | |
Component: | programming | Keywords: | |
Cc: | Parent Tickets: |
Description (last modified by )
Tagging (`#360 </issues/360>`_) needs to be iterated on. This ticket addresses support for quoting. What are the expected behaviors? In the following examples, user input is underlined, and resulting tags are indented. - In the simple case (assuming comma-delimited text), the user has a single pair of quotes embedded within the input, such as: meta,'yo dawg, I heard you like delimiters...',humor u'meta' u'yo dawg, I heard you like delimiters...' u'humor' - what about mixed quotes? (eg user wants *you're* in the tag) meme,"I don't even...",humor u'meme' u'I don't even' u'humor' - then, as Will noted in IRC, some users may wish to tag using xml or hierarchical organization schemes. At what point does supporting quoting become complex enough to warrant using a plugin? **tl;dr - introduce support for tag delimiter(s) within a tag. Efforts towards native gmg support go here. Add a new ticket for plugin work** Feel fix this ticket if I'm missing the point or the point is not clear.
Change History (9)
comment:2 by , 13 years ago
Owner: | set to |
---|---|
Status: | New → Feedback |
Updated convert\_to\_tag\_list\_of\_dicts to use shlex to parse tag strings. Now a tag with quotes around it will be treated as one tag, even if it contains commas. Branch: [https://gitorious.org/\ :sub:`copiesofcopies/mediagoblin/copiesofcopies-mediagoblin/commits/feature454\_quoted\_tags](https://gitorious.org/`\ copiesofcopies/mediagoblin/copiesofcopies-mediagoblin/commits/feature454\_quoted\_tags)
comment:3 by , 13 years ago
Also: this patch supports single quotes/apostrophes in tags wrapped in double-quotes, but I'm not sure how to write the test for this.
comment:4 by , 13 years ago
Owner: | changed from | to
---|---|
Status: | Feedback → In Progress |
Already looks quite good! I think ``media_tags_as_string()`` does not yet handle this properly. This especially affects the "edit media" page.
comment:5 by , 13 years ago
The original url for this bug was http://bugs.foocorp.net/issues/454 .
Relations:
#75: blocked
comment:6 by , 12 years ago
Description: | modified (diff) |
---|---|
Owner: | removed |
This ticket hasn't been worked on for some while so I'm removing the claim!
comment:7 by , 8 years ago
Type: | defect → enhancement |
---|
For what it's worth, MediaGoblin does work exactly as the form suggests: "Separate tags by commas". I'm not entirely convinced that allowing quotes is a helpful thing to do for the majority of people - it may just complicate things.
Regardless whether we implement the feature or not, I think it would be necessary to use the csv
module to deserialize/serialize. The shlex
module doesn't allow us to go both ways reliably.
comment:8 by , 8 years ago
Aaron, even though my feeling is that shlex
might not be the eventual solution here, I'd still be interested to see your code if you have a copy. Unfortunately the Gitorious repository referenced is no longer available.
comment:9 by , 8 years ago
I've made a quick initial attempt at adding this feature using the csv
module to serialize/deserialize the tags. Unfortunately that isn't going to work because of one tiny detail: csv.writer
can only output with a single character delimiter, eg. ,
(comma) and not ,
(comma + space). Without this, our serialised tags will look like yin,yang
, rather than yin, yang
- which I don't think we want. The csv
module is also written in C, so there's no trivial fix.
This feature is actually quite complex because:
- CSV serializing/deserializing requires a real parser to handle quoting correctly (so we'll have to re-implement
csv
)
- We're using the serialized version as user interace:
- Our parser must input/output Unicode text, not UTF-8 encoded bytes as
csv
does (noting that Python 3csv
requires Unicode input) - Our parser must not require whitespace after comma, but should also collapse extra whitespace around and within tags when deserializing
- Our parser must input/output Unicode text, not UTF-8 encoded bytes as
Based on the amount of work involved in implementing this feature and my assumption that it won't be heavily used, I'd suggest we keep things simple and don't build this. I'll keep this open for a few weeks to see if anyone has other thoughts.