Score:0

Dealing with Feed Reader traffic

us flag

The domain that I manage used to have a lot of RSS feeds, almost all of them are gone now, but I still get loads and loads of traffic looking for these feeds. I'm not sure the best way to handle this. Looking at a one day slice of the apache access log, roughly 23% of the hits to the site are from feed readers.

By "hits" here, I mean I looked at every entry in the access log for the day.. that might not be the proper nomenclature - I'm not sure.

I provide 301s for the majority of these. The actual network traffic to these is minimal (I assume), but that number of hits seems like it must have an impact.

The feeds have been gone for years at this point, so these must be zombie feed readers. I can tell from the logs that there are a variety of types: feedburner and feedfetcher look like the most common ones.

Is there anything to be done about it?

Score:1
us flag

If you are sure those URLs will never have any content, you can try returning 410 Gone to the requests for those URLs.

From Mozilla documentation:

410 Gone This response is sent when the requested content has been permanently deleted from server, with no forwarding address. Clients are expected to remove their caches and links to the resource. The HTTP specification intends this status code to be used for "limited-time, promotional services". APIs should not feel compelled to indicate resources that have been deleted with this status code.

However, there is no guarantee that the reader software properly recognize the status code, and they still might just retry loading the feed URLs.

Then the next step could be to block the IP addresses for these. However, this could also block legitimate traffic, and the IP addresses of the feed software could change - then you would need to update the block address lists.

zip_000 avatar
us flag
I thought of blocking IPs, but it does look like the traffic comes from the same IPs as legit traffic. I'll look into using 410s, thanks for the suggestion!
HBruijn avatar
in flag
+1 for the 410, but when currently the 301 permanent redirects are not picked up I don't expect that the 410 response will make the clients adjust their behaviour either. It might be interesting to see though if a 404 does improve the rss client behaviour.
zip_000 avatar
us flag
I've been serving 301s on some and 404s on others, and the clients haven't stopped! I'm going to setup 410s to see if it makes any difference, but I sort of doubt it will too. Reducing the server load by blocking them in the root apache conf seems like it might be an improvement though. One option that I was entertaining was blocking the user agents at the load balancer, but that doesn't work with the type of load balancer we have.
Score:1
in flag

You can reduce the "cost" of hits to non-existent resources by blocking them from the main httpd.conf (or an applicable included configuration snippet).

Static Redirect and/or RedirectMatch directives should be computationally be cheaper than using dynamic mod_rewrite rules especially compared to those loaded from a .htaccess file. (Although I don't have hard numbers this reference suggests so.)

As already suggested the HTTP return code of 410 gone would be the most suitable one. Alternatively sending the 404 - file not found status code would also be suitable and using the Redirect directive then prevents the web server from having to check the file system for a non-existent resource, reducing load.

For example:

<VirtualHost *80>
   ServerName example.com
   ServerAlias www.example.com

   # All RSS feeds under /rss/legacy/* are gone

   Redirect gone "/rss/legacy" 

   
   # /rss/[alpha.rss, bravo.rss, ... sierra.rss] are gone, 
   # but /rss/tango.rss still exists

   RedirectMatch gone "/rss/[a-s].*\.rss$" 

   # /rss/victor.rss has also been discontinued

   Redirect 410 "/rss/victor.rss" 
   ...

</VirtualHost>

To further reduce the load and bandwidth of content that no longer exists, you might want to ensure that you don't have a friendly (custom) error document that gets sent when your server returns a status "410 - gone". i.e. Check that your configuration does not include a ErrorDocument 410 setting up a custom error page but is either not set or rather ErrorDocument 410 default or ErrorDocument 410 "Gone"

zip_000 avatar
us flag
This server has two separate sites on it, and only one of which has the dead rss feed problems. When you say main httpd.conf - do you mean the default one in (on my server at least): /etc/httpd/conf/httpd.conf? Or do you mean the main conf file for this particular server. I'm reading you as the latter one, but I want to make sure I'm understanding!
HBruijn avatar
in flag
There are set-ups where there is not only a httpd.conf but also several include directories with snippets that get pulled into the main httpd.conf hence my more generic style of answer. You place the directives wherever they will have the intended effect. For s server hosting several sites in the appropriate `VirtualServer` block would make most sense.
zip_000 avatar
us flag
Great, that's what I thought you meant. Thanks. I'll give it a try tomorrow morning.
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.