Opened 8 years ago

Closed 7 years ago

#802 closed defect (wontfix)

sitemap.xml generation

Reported by: user-A Owned by:
Priority: major Milestone:
Component: programming Keywords: seo, sitemaps, search
Cc: Parent Tickets:

Description

Hi, I'm partiticipating to the opensource distributed searchengine YACY. Thus I like to encourage you, to offer webspiders an sitemap.xml file (and refer them in the robots.txt): https://en.wikipedia.org/wiki/Sitemaps

This allows the bot to just walk trough the flat list and he doesn't has to get the webstructure out of the parsed HTML pages. This avoids that the searchengine misses some content and speeds up the analysis process.

Subtickets

Attachments (1)

802-sitemap-generation.patch (3.2 KB) - added by digital-dreamer 8 years ago.

Download all attachments as: .zip

Change History (4)

comment:1 Changed 8 years ago by digital-dreamer

I know this feature request hasn't been accepted yet, but I already implemented it. So far it is just a flat list of urls to index - no “lastmod” dates, because mediagoblin doesn't store this info. But it can already simplify the crawling process. To user-A: I don't have any experience with YaCy, could you please test if this satisfies its needs? Just download my patch file from the attachments section of this ticket, patch any instance of mediagoblin with it, and it will add /sitemap.xml and robots.txt. Try to index it with YaCy and tell me, it if this is enough, or whether there's something I should add or change.

Changed 8 years ago by digital-dreamer

comment:2 Changed 7 years ago by ShawnRisk

Status: newreview

comment:3 Changed 7 years ago by Christopher Allan Webber

Resolution: wontfix
Status: reviewclosed

This iterates through all users and media, and depending on the size of the site, that could be a *ton* of users and media, enough to generate an enormous document or bring the site crashing to a halt. Of course, on smaller instances, that may be no problem... and maybe this patch is ideal for smaller instances.

Regardless, I get the sense that this would be best handled as an external plugin.

It's a good idea though! I just don't think I see a way to bring it in to mediagoblin proper.

Note: See TracTickets for help on using tickets.