is it safe for me to assume that if i visit http://www.example.com https://www.example.com http://www.example.com:8080 a webserver will reply?
No, you can't assume anything.
While 80 and 443 are IANA reserved to be for HTTP and HTTPS, nothing forbids anyone to run something else on those ports.
Also the ports can be seen as open but acting as a decoy, from kernel level.
Or there can be a process really attached to it, but for some reason it doesn't work properly and hence won't reply at all.
But even if you have a fully working webserver, you might not get replies or not get the expected reply.
If you get www.example.com
resolving to 192.0.2.42
, the webserver at that IP address still needs to be especially configured to know about the name www.example.com
(or be configured in some wildcard way to accept any name). If it is not configured, it won't know what content to send back (there could be multiple websites hosted on the same IP address), and can either send content of one of the website it manages (like in Apache, first one declared in configuration files) or return an error, like HTTP 400 or 500 to tell the client that the name is wrong.
It can easily happen, because you can take any existing website, find its IP address, and then in ANY zone, add a record for ANY name pointing to that IP address (or similarly on Unix systems, tinker with /etc/hosts
).
Of course the webserver at that IP address will have no way to know what this new name is about, and hence will not reply as expected.
like can an ip have port X open but in order for me to access it via hostname i have to go to domain:Y (where Y is a different port than X)
That part is not clear. IP addresses do not have "ports". You might need to document yourself on the OSI 7 layers model of networking, or the Internet model. In short, at bottom you have IPv4 and IPv6, and there are no ports there, that concept just does not exist. On top of that, you have protocols like TCP and UDP, which do define ports. A TCP connection is a 4 items tuple: source IP, source port, destination IP, destination port.
Kind of on top of TCP, you have TLS. A TLS handshake, typically, starts with a "Server Name Indication" extension in the ClientHello message. This extension allows the client to specify the hostname of the host it is trying to reach at that IP level. Because through the DNS the client will have mapped the hostname (like extracted from the URL in case of HTTPS) to an IP address, but then as explained above that IP address can do mass virtual hosting of multiple names, so before even sending the first HTTP message, the TLS server needs to know for which hostname it is queried for (to be able to immediately return the correct certificate, as this happens early in the TLS handshake, and obviously before any HTTP exchange), and this is what the client does by using the SNI extension.
But what you describe can happen in some cases of proxying and/or domain cloacking.