We're struggling with a unique situation where malicious/unauthorized requests are being made to our site via 'Google Proxy' IP addresses.
Someone is using Google servers to 'proxy' our website and serve up all the same content, stripping scripts and adding their own advertisements.
Request User Agent:
Mozilla/5.0 (compatible) Feedfetcher-Google; (+http://www.google.com/feedfetcher.html)
Request IP Addresses:
Reverse IP (PTR) google-proxy-66-102-9-1.google.com
ASN 15169 (GOOGLE)
ISP / Organization Google Proxy
66.102.9.1
66.102.9.29
66.102.9.25
66.102.9.6
66.249.84.235
We would like to simply block this user-agent, but unfortunately it is also used for google's official crawling of our RSS feeds.
I've done a substantial amount of digging, but I'm unable to determine
- how these requests are being generated
- any fingerprint unique to these requests that isn't present in official Google feed requests
I've tried to get some google services to generate similar requests, but nothing quite matches up. Both Feedburner and Google Drive have different user agents that are used.
I've read about how some Chrome RSS reader may be the potential 'proxy' being used to generate this requests, but I'm unable to verify.
Any help nailing down the potential source of these requests or suggestions on how to block the malicious requests while still allowing 'good' requests would be appreciated.
Update [2023-06-13]:
I've found this Webmaster Stackexchange post that outlines a similar issue, but the specifics don't quite fit and no legitimate google service is identified that could be sending these requests.
Upon further review of the requests sent by google-proxy IP addresses, I've found that some requests will append gzip(gfe) to the query string. "gzip(gfe)" indicates that the browser supports the gzip compression method and that it is using the Google Frontend for Embedded (GFE) service. GFE is a service that Google provides to help websites load faster by caching and serving static content. When a browser requests a web page that is being cached by GFE, the request will be sent to GFE instead of the original website. GFE will then compress the content and return it to the browser, which can help to improve the loading speed of the page. This service, however includes X-Forwarded-For header so you can identify the real IP of the request and still doesn't clear up the issue of how the malicious requests are sent or how they can be reliably blocked.