My website has an area restricted to users who sign up with a valid email. I have got requests with bogus emails, and I want to avoid sending emails to non-existent addresses lest they increase the bounce rate and hurt my sending reputation.
The emails are:
[email protected]
[email protected]
kWQcHVzn%40ypEcDvh.NwB
The last one has %40
, the HTML entity for @
. The emails are truncations of the same character sequence.
Inspecting IP address of the requests with reverse DNS, all three requests come from cache.google.com
. If the requests come from Google's crawler, I would expect these email addresses to be documented, but I could not find any reference.
In case it is the Google crawler, I want it to index the website while avoiding send email addresses to bogus addresses. I have already implemented filtering on the address looking for that character sequence.
Is there a list of bogus addresses that deep web crawlers use to gain access and index hidden pages?
Update
Following the answer and the comment pointing at verifying that Googlebot is the crawler, I confirmed that it is not:
$ host 212.113.167.197
197.167.113.212.in-addr.arpa domain name pointer cache.google.com.
$ host cache.google.com
Host cache.google.com not found: 3(NXDOMAIN)
So indeed, it seems a malicious user, which explains why that email address is not documented as coming from Google.