Score:5

Using iptables forwarding, while properly keeping source IP

ng flag

I have a server running Wireguard (thus needing masquerade) and a container running on port 2525.

I have the following iptables rules:

iptables -t nat -A PREROUTING -i eth0 -p tcp --dport 25 -j DNAT --to-destination 172.18.0.1:2525
iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE

When connecting to server:2525 directly, the Docker container is able to see my real IP address (1.2.3.4). When connecting to port server:25, the Docker container is seeing the local IP that's provided by docker network:

Apr 07 12:45:46 mx postfix/smtpd[87]: lost connection after CONNECT from unknown[172.18.0.1]
Apr 07 12:45:46 mx postfix/smtpd[87]: disconnect from unknown[172.18.0.1] commands=0/0

How do I make sure the Docker container is properly seeing the public IP address while connecting to port 25 (and not only when connecting to port 2525).

Thanks

# iptables -L -n -v -t nat
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
52300 3131K DNAT       tcp  --  eth0   *       0.0.0.0/0            0.0.0.0/0            tcp dpt:25 to:172.18.0.1:2525
 150K 8524K DOCKER     all  --  *      *       0.0.0.0/0            0.0.0.0/0            ADDRTYPE match dst-type LOCAL

Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
    2   120 DOCKER     all  --  *      *       0.0.0.0/0           !127.0.0.0/8          ADDRTYPE match dst-type LOCAL

Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
 3385  256K MASQUERADE  all  --  *      !docker0  172.17.0.0/16        0.0.0.0/0
1733K  104M MASQUERADE  all  --  *      !br-b147ffdbc9f3  172.18.0.0/16        0.0.0.0/0
    0     0 MASQUERADE  tcp  --  *      *       172.17.0.2           172.17.0.2           tcp dpt:53
    0     0 MASQUERADE  udp  --  *      *       172.17.0.2           172.17.0.2           udp dpt:53
    0     0 MASQUERADE  tcp  --  *      *       172.18.0.2           172.18.0.2           tcp dpt:25

Chain DOCKER (2 references)
 pkts bytes target     prot opt in     out     source               destination
   12  1419 RETURN     all  --  docker0 *       0.0.0.0/0            0.0.0.0/0
    0     0 RETURN     all  --  br-b147ffdbc9f3 *       0.0.0.0/0            0.0.0.0/0
   56  3192 DNAT       tcp  --  !docker0 *       0.0.0.0/0            0.0.0.0/0            tcp dpt:5354 to:172.17.0.2:53
    0     0 DNAT       udp  --  !docker0 *       0.0.0.0/0            0.0.0.0/0            udp dpt:5354 to:172.17.0.2:53
  107  6020 DNAT       tcp  --  !br-b147ffdbc9f3 *       0.0.0.0/0            0.0.0.0/0            tcp dpt:2525 to:172.18.0.2:25
# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 32:d0:56:15:0a:64 brd ff:ff:ff:ff:ff:ff
    altname enp0s3
    altname ens3
    inet 159.223.80.86/20 brd 159.223.95.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 10.15.0.19/16 brd 10.15.255.255 scope global eth0:1
       valid_lft forever preferred_lft forever
    inet6 2400:6180:0:d0::f57:6001/64 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::30d0:56ff:fe15:a64/64 scope link
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 32:dc:4a:e4:27:be brd ff:ff:ff:ff:ff:ff
    altname enp0s4
    altname ens4
    inet 10.130.244.15/16 brd 10.130.255.255 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::30dc:4aff:fee4:27be/64 scope link
       valid_lft forever preferred_lft forever
4: wg0: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1420 qdisc noqueue state UNKNOWN group default qlen 1000
    link/none
    inet 10.200.200.52/24 scope global wg0
       valid_lft forever preferred_lft forever
5: wg1: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1420 qdisc noqueue state UNKNOWN group default qlen 1000
    link/none
    inet 10.222.111.1/24 scope global wg1
       valid_lft forever preferred_lft forever
6: br-b147ffdbc9f3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 02:42:46:21:70:c0 brd ff:ff:ff:ff:ff:ff
    inet 172.18.0.1/16 brd 172.18.255.255 scope global br-b147ffdbc9f3
       valid_lft forever preferred_lft forever
    inet6 fe80::42:46ff:fe21:70c0/64 scope link
       valid_lft forever preferred_lft forever
7: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 02:42:66:22:41:91 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:66ff:fe22:4191/64 scope link
       valid_lft forever preferred_lft forever
9: veth31eff9d@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default
    link/ether e6:fb:80:5d:c7:a3 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::e4fb:80ff:fe5d:c7a3/64 scope link
       valid_lft forever preferred_lft forever
19: veth01269f5@if18: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-b147ffdbc9f3 state UP group default
    link/ether 36:f4:e7:43:5f:da brd ff:ff:ff:ff:ff:ff link-netnsid 2
    inet6 fe80::34f4:e7ff:fe43:5fda/64 scope link
       valid_lft forever preferred_lft forever
setenforce 1 avatar
us flag
it looks like you have an SNAT postrouting rule when going to the docker interface, that you could delete. Can you please add the result of `iptables -L -n -v -t nat` ?
eKKiM avatar
lr flag
Can you ellaborate more on your network infra/setup including the wireguard part? It's unclear for me (at this moment) why masquerade is necessary.
ng flag
@setenforce1 added to the initial post
ng flag
@eKKiM I believe it's needed to properly route (i.e. it acts as a gateway/VPN: client -> WG server -> internet)
eKKiM avatar
lr flag
Is the WG server also your default gateway in your network? Can you add the output of `ifconfig -a` or `ip addr`?
ng flag
No, it's a digital ocean server. eth0 has a public IP. Added `ip addr` to origin post.
eKKiM avatar
lr flag
I do not see a reason for all those POSTROUTING masquerade rules. Could you replace it with a single `iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE`
ng flag
I removed one, I currently have this: ``` ip6tables -A INPUT -i eth0 -m tcp -p tcp --dport 25 -j REJECT ip6tables -A INPUT -i eth0 -m tcp -p tcp --dport 2525 -j REJECT iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE ```
Score:2
cl flag
A.B

Just let Docker handle the redirection, which is dynamic and could change when containers are added, removed or restarted. But see UPDATE below.

This redirection should not be to 172.18.0.1 which is the host and not the container. When the host receives such connection, it is handled by docker-proxy which proxies it to the container, losing the source IP address in the process.

Docker already DNAT + routes this port correctly (except from the host itself, where docker-proxy plays this role) in the very last rule of the ruleset, to the running container with the 172.18.0.2 address. Except it's configured to use port 2525 rather than port 25.

  107  6020 DNAT       tcp  --  !br-b147ffdbc9f3 *       0.0.0.0/0            0.0.0.0/0            tcp dpt:2525 to:172.18.0.2:25

This should be fixed with Docker settings, not manual iptables rules that can't change whenever container layout changes. As port 25 is privileged, if Docker is running rootless, additional settings are needed, check the documentation about exposing privileged ports.


UPDATE (factoring in OP's comments): OP can't currently use -p 25:25 because docker-proxy clashes with the local host's SMTP server and competes for listening on port 25 on the host. That's the reason the initial (wrong) iptables redirection was made by OP.

One can either:

  • disable globally docker-proxy by running dockerd with the property userland-proxy set to false

    either as a parameter --userland-proxy=false or as a property "userland-proxy": false added to /etc/docker/daemon.json.

    This will then allow to use docker run ... -p 25:25 ... (as documented) without clash: the host will reach itself when reaching localhost or $HOSTNAME, remote systems will reach the container and no "Address already in use" will make the host's SMTP daemon or Docker's container fail at start.

  • or else add a manual redirection (with lengthy setup below to do it almost automatically)

    Whenever the container is restarted, there is a risk that its (internal) IP address will change. So this must be computed. So with a container named mx using a network named mx and a single IP address involved this can be done as explained below.

    Create a separate prerouting chain (so it can be flushed without having to flush anything else) and call it first:

    iptables -t nat -N mynat
    iptables -t nat -I PREROUTING -j mynat
    

    The container's IP address can be retrieved programmatically (for the simple case of a container named mx with a single address):

    containerip=$(docker container inspect --format '{{.NetworkSettings.IPAddress}}' mx)
    

    (or one could use jq: containerip=$(docker container inspect mx | jq '.[].NetworkSettings.IPAddress')

    Finding the bridge interface name is more convoluted, or at least I couldn't find a way for it using only docker ... inspect. So find its IP address on the host and query the host using ip address to find only the bridge interface where this specific IP address is set (requires the jq command.)

     bridgeip=$(docker network inspect --format '{{(index .IPAM.Config 0).Gateway}}' mx)
     bridgeinterface=$(ip -json address show to "$bridgeip"/32 | jq -r '.[].ifname')
    

    Flush and repopulate mynat each time the container is (re)started:

    iptables -t nat -F mynat
    iptables -t nat -A mynat ! -i "$bridgeinterface" -p tcp --dport 25 -j DNAT --to-destination "$containerip":25
    

    which would be for the current case:

    iptables -t nat -I mynat ! -i br-b147ffdbc9f3 -p tcp --dport 25 -j DNAT --to-destination 172.18.0.2:25

    And to be sure Docker's own firewalling rules don't block such traffic, do something similar starting in filter/FORWARD from the DOCKER-USER chain.

    Initially (if from boot, you might have to also create first DOCKER-USER):

    iptables -N myforward
    iptables -I DOCKER-USER 1 -j myforward
    

    Then later each time the container is (re)started:

    iptables -F myforward
    iptables -A myforward -p tcp ! -i "$bridgeinterface" -d "$containerip" -p tcp --dport 25 -j ACCEPT
    

    which would be for the current case:

    iptables -A myforward -p tcp ! -i br-b147ffdbc9f3 -d 172.18.0.2 -p tcp --dport 25 -j ACCEPT
    

Notes:

  • To simplify rules above and avoid some of the calculations, the container and its bridge network can be started with fixed IP addresses. See for example this SO Q/A: Assign static IP to Docker container.

  • Here's also an UL SE Q/A with an answer of mine about problems when interacting with Docker (it's geared for nftables but some parts about the DOCKER-USER chain or br_netfilter bridge interactions are still of interest): nftables whitelisting docker

ng flag
Thanks. This is using `docker run`: `docker run -d -P --network mx -p 2525:25 -v /srv/mx/:/mx/ --restart always --name mx mx`
ng flag
I'm running on port 2525 because, if I don't run a postfix on localhost, `mail` command doesn't work and email is not delivered. https://pastebin.com/RR2mDx9T (when running the container on port 25)
eKKiM avatar
lr flag
afaik docker userland proxy will literally proxy the connection thus the source ip will get changed..
A.B avatar
cl flag
A.B
@eKKiM indeed, that's what I cover at `docker-proxy`.
A.B avatar
cl flag
A.B
@Tuinslak so with this context (can't use port 25) the Docker rule already in place (the very last rule seen in the ruleset displayed in the question put in place by `-p 2525:25`) already handles it. Why are you then adding a rule overriding it?
eKKiM avatar
lr flag
@A.B the phrasing of "Just let Docker handle the redirection" let me assume to let it getting handled by docker-proxy
A.B avatar
cl flag
A.B
@eKKiM docker-proxy is involved only for the host case: 172.18.0.1. It's not involved when routing to the container 172.18.0.2. But the iptables rule OP added makes the flow forced to docker-proxy instead of being routed.
ng flag
If I don't run `iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE`, the server is not forwarding packets and I can't use it as VPN/ping any host outside of the server - https://pastebin.com/uY0Rka6Z
ng flag
Looks like `iptables -t nat -I PREROUTING -i eth0 -p tcp --dport 25 -j DNAT --to-destination 172.18.0.2:25` works. So, I can't use the docker-proxy (`172.18.0.1`) but need to use the actual IP of the container?
A.B avatar
cl flag
A.B
My link about `docker-proxy` tells it can be globally disabled with `--userland-proxy=false`. Or `"userland-proxy": false` in [`/etc/docker/daemon.json`](https://docs.docker.com/engine/reference/commandline/dockerd/) . If you don't need to reach your containers (all of them not just this one: it's global) using the host itself, then you should be able to simply use `-p 25:25`. This is an issue only when NAT hairpin is involved, so when a containers needs to reach an other container using the public (eth0's address) host's IP address.
A.B avatar
cl flag
A.B
You can use `172.18.0.2:25` without problem but bear in mind this address isn't permanent, it could change when you run other containers depending on the configuration.
A.B avatar
cl flag
A.B
I modified my answer to address the information in comments and do some automation.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.