Opened 14 years ago
Last modified 9 years ago
#162 accepted enhancement
add quoting support to tagging
| Reported by: | Caleb Davis | Owned by: | |
|---|---|---|---|
| Priority: | minor | Milestone: | |
| Component: | programming | Keywords: | |
| Cc: | Parent Tickets: |
Description (last modified by )
Tagging (`#360 </issues/360>`_) needs to be iterated on. This
ticket addresses support for quoting. What are the expected
behaviors? In the following examples, user input is underlined, and
resulting tags are indented.
- In the simple case (assuming comma-delimited text), the user has
a single pair of quotes embedded within the input, such as:
meta,'yo dawg, I heard you like delimiters...',humor
u'meta'
u'yo dawg, I heard you like delimiters...'
u'humor'
- what about mixed quotes? (eg user wants *you're* in the tag)
meme,"I don't even...",humor
u'meme'
u'I don't even'
u'humor'
- then, as Will noted in IRC, some users may wish to tag using xml
or hierarchical organization schemes. At what point does supporting
quoting become complex enough to warrant using a plugin?
**tl;dr - introduce support for tag delimiter(s) within a tag. Efforts towards native gmg support go here. Add a new ticket for plugin work**
Feel fix this ticket if I'm missing the point or the point is not
clear.
Change History (9)
comment:2 by , 14 years ago
| Owner: | set to |
|---|---|
| Status: | New → Feedback |
Updated convert\_to\_tag\_list\_of\_dicts to use shlex to parse tag strings. Now a tag with quotes around it will be treated as one tag, even if it contains commas. Branch: [https://gitorious.org/\ :sub:`copiesofcopies/mediagoblin/copiesofcopies-mediagoblin/commits/feature454\_quoted\_tags](https://gitorious.org/`\ copiesofcopies/mediagoblin/copiesofcopies-mediagoblin/commits/feature454\_quoted\_tags)
comment:3 by , 14 years ago
Also: this patch supports single quotes/apostrophes in tags wrapped in double-quotes, but I'm not sure how to write the test for this.
comment:4 by , 14 years ago
| Owner: | changed from to |
|---|---|
| Status: | Feedback → In Progress |
Already looks quite good! I think ``media_tags_as_string()`` does not yet handle this properly. This especially affects the "edit media" page.
comment:5 by , 14 years ago
The original url for this bug was http://bugs.foocorp.net/issues/454 .
Relations:
#75: blocked
comment:6 by , 13 years ago
| Description: | modified (diff) |
|---|---|
| Owner: | removed |
This ticket hasn't been worked on for some while so I'm removing the claim!
comment:7 by , 9 years ago
| Type: | defect → enhancement |
|---|
For what it's worth, MediaGoblin does work exactly as the form suggests: "Separate tags by commas". I'm not entirely convinced that allowing quotes is a helpful thing to do for the majority of people - it may just complicate things.
Regardless whether we implement the feature or not, I think it would be necessary to use the csv module to deserialize/serialize. The shlex module doesn't allow us to go both ways reliably.
comment:8 by , 9 years ago
Aaron, even though my feeling is that shlex might not be the eventual solution here, I'd still be interested to see your code if you have a copy. Unfortunately the Gitorious repository referenced is no longer available.
comment:9 by , 9 years ago
I've made a quick initial attempt at adding this feature using the csv module to serialize/deserialize the tags. Unfortunately that isn't going to work because of one tiny detail: csv.writer can only output with a single character delimiter, eg. , (comma) and not , (comma + space). Without this, our serialised tags will look like yin,yang, rather than yin, yang - which I don't think we want. The csv module is also written in C, so there's no trivial fix.
This feature is actually quite complex because:
- CSV serializing/deserializing requires a real parser to handle quoting correctly (so we'll have to re-implement
csv)
- We're using the serialized version as user interace:
- Our parser must input/output Unicode text, not UTF-8 encoded bytes as
csvdoes (noting that Python 3csvrequires Unicode input) - Our parser must not require whitespace after comma, but should also collapse extra whitespace around and within tags when deserializing
- Our parser must input/output Unicode text, not UTF-8 encoded bytes as
Based on the amount of work involved in implementing this feature and my assumption that it won't be heavily used, I'd suggest we keep things simple and don't build this. I'll keep this open for a few weeks to see if anyone has other thoughts.
