I can understand that the signature helps ensure that the public key was not tampered with.
The certificate contains a lot of other information, which is all signed by the signature. An X.509 certificate consists mainly of a part called TBSCertificate, which literally means "to be signed certificate". The only parts that are not included are is the initial ASN.1 SEQUENCE and - of course - the signature itself.
A successful signature verification does indeed show that none of the information is tampered with. However, more importantly, it also shows that the information in the certificate was deemed correct by the issuer of the certificate. This is what enables a public key infrastructure (PKI).
If I make a request to a secure site and get a certificate, then I get a public key and a signature all in one certificate file.
You'll normally get a certificate chain that hopefully leads up to a trust anchor. As indicated before, certificates contain much more information than just a public key and a signature. Finally, the data received is usually just handled in memory; it may be stored in a file or cached, but there is no need for it to be in a file.
To verify that the certificate arrived intact, I decrypt the signature with the public key (hash #1), hash the public key from scratch myself (hash #2), and agree that hash #1 equals hash #2.
As indicated, the part that you has is the "to be signed" part which includes the public key. Furthermore, signature verification is not the same as decryption with the public key and then performing a hash compare.
But does that actually do any good? A man-in-the-middle attack can just intercept the certificate, replace it with their own intact certificate, and I won't know the difference.
The idea is that you build a trust path from the received certificate chain up to a trust anchor. Then you verify and validate each certificate in the chain. If both the verification and validation succeeds then and only then can you trust the server certificate.
Creating the trust path and the validation of the certificates is usually much harder than verifying the signatures.
Validation consists for instance of:
- checking if the certificate is still in the validity period;
- checking if the certificate has not been revoked by the CA (i.e. checking the status using a CRL or OCSP);
- checking if the right data elements are present, such as serial number, the right policies, the correct (extended) key usuages.
For web servers it is also of vital importance that the name of the server is presented in the certificate (commonly in the Subject Alternative Name or SAN for short). If the server name is not checked then the certificate basically has no value: valid any server / private key combination can be used to authenticate (e.g. anyone with a valid certificate for their own site could impersonate Google).
Checking the hashes seems futile because the certificate could have been signed by the private key of the perpetrator, not the one from the site I requested.
Yes, but the perpetrator should not have a certificate and private key that can be used to build the trust path. As long as this is not the case the trust path creation or verification should fail.
Apparently certificate authorities address this vulnerability, but I don't see how an extra signature in the certificate file would help.
It's not an extra signature, it is the signature. There is only one certificate authority that issues the certificate. A PKI infrastructure is a tree structure, usually with a trusted, self-signed root certificate at the top.
Is it signed by the CA's own private key?
Yes, correct.
And what is the guarantee exactly?
Good question. The guarantee is that the receiver is that all entities in the domain should be able to build a trust path to a trust anchor. For this kind of PKI this is usually a collection of root certificates delivered - for instance - with your browser or operating system. The list is kept up to date using updates of the software.
Furthermore, the CA promises using their signature that the information within the certificate is valid. The check if the information in the certificate request is valid is performed by a sub-entity of the CA called a registration authority (RA). The final certificate consists of information from the certification request (including the public key and server name) and information added by the CA (such as the validity period, an URL to the certificate revocation list etc.).
For certificates used for TLS / HTTPS the CA should also test that the entity that performs the request owns or at least controls the server.
If the first round of hash-checking doesn't preclude a man-in-the-middle attack, how will another signature help?
Reiterated: it's called signature verification (which includes the hashing part) and there is only one signature. The TL;DR is that building and validating the trust path will preclude man-in-the-middle attacks.
Once the client has a trusted server certificate the server still needs to show that it has access to the private key by performing a private key calculation over known data, e.g. the session data in the case of TLS.
One big critique of the PKI used by browsers is that there are a lot of root certificates / trust anchors. Only one of those needs to be compromised for the system to fail. Famously the Irianian government used wrongly issued certificates obtained from a now defunct CA (DigiNotar) to perform a man-in-the-middle attacks for Google services.
For proprietary communication it makes sense to only install trust anchors for specific CA's.