Opened 11 years ago

Closed 10 years ago

#802 closed defect (wontfix)

sitemap.xml generation

Reported by: user-A Owned by:
Priority: major Milestone:
Component: programming Keywords: seo, sitemaps, search
Cc: Parent Tickets:

Description

Hi, I'm partiticipating to the opensource distributed searchengine YACY. Thus I like to encourage you, to offer webspiders an sitemap.xml file (and refer them in the robots.txt): https://en.wikipedia.org/wiki/Sitemaps

This allows the bot to just walk trough the flat list and he doesn't has to get the webstructure out of the parsed HTML pages. This avoids that the searchengine misses some content and speeds up the analysis process.

Attachments (1)

802-sitemap-generation.patch (3.2 KB ) - added by digital-dreamer 10 years ago.

Download all attachments as: .zip

Change History (4)

comment:1 by digital-dreamer, 10 years ago

I know this feature request hasn't been accepted yet, but I already implemented it. So far it is just a flat list of urls to index - no “lastmod” dates, because mediagoblin doesn't store this info. But it can already simplify the crawling process. To user-A: I don't have any experience with YaCy, could you please test if this satisfies its needs? Just download my patch file from the attachments section of this ticket, patch any instance of mediagoblin with it, and it will add /sitemap.xml and robots.txt. Try to index it with YaCy and tell me, it if this is enough, or whether there's something I should add or change.

by digital-dreamer, 10 years ago

comment:2 by ShawnRisk, 10 years ago

Status: newreview

comment:3 by Christopher Allan Webber, 10 years ago

Resolution: wontfix
Status: reviewclosed

This iterates through all users and media, and depending on the size of the site, that could be a *ton* of users and media, enough to generate an enormous document or bring the site crashing to a halt. Of course, on smaller instances, that may be no problem... and maybe this patch is ideal for smaller instances.

Regardless, I get the sense that this would be best handled as an external plugin.

It's a good idea though! I just don't think I see a way to bring it in to mediagoblin proper.

Note: See TracTickets for help on using tickets.