TL;DR: The most likely explanation is that you dropped ICMPv6. Don't drop all of ICMPv6 unless you understand what precisely should still not be dropped.
During your tests, you have to insert a rule like:
meta l4proto ipv6-icmp accept
before the rules that drop other packets. The rule above could be refined, but that's out of the scope of this Q/A.
ICMPv6, among other roles (such as ping, time exceeded, ...) includes the equivalent role of IPv4's ARP: to resolve the link layer MAC address from the upper layer IPv6 address with the Neighbor Discovery Protocol (NDP). While ARP is not firewalled in the ip
family but in the arp
family, NDP, part if ICMPv6 does get firewalled by the ip6
(or the inet
) family. If all is dropped, then NDP is dropped.
The way cache revalidation occurs, once a neighbor entry naturally switches from REACHABLE to the STALE state after a few seconds, it can still be used without confirmation especially when traffic happens. After some time it switches to the PROBE state which then requires an NDP confirmation, switching back to REACHABLE. So one kind of state flow is:
REACHABLE -> STALE -> PROBE -> REACHABLE
Depending on traffic there are other possibilities. Details are available in RFC 4861. The STALE state can exist for a very long time when there's no traffic:
From the perspective of correctness, there is no need to periodically
purge Destination and Neighbor Cache entries. Although stale
information can potentially remain in the cache indefinitely, the
Neighbor Unreachability Detection algorithm ensures that stale
information is purged quickly if it is actually being used.
During the STALE and PROBE states, the old entry is given the benefit of the doubt: it can still be used to send traffic. This stops when NDP fails a few seconds later: the destination is now in FAILED state and the MAC address is deleted from the cache.
If you disable all traffic, you also disable NDP traffic and lose IPv6 connectivity very quickly (including global IPv6).
This explains all of your symptoms:
- you made a attempt without rules: NDP successfully happened and client's NDP cache went in state REACHABLE and then possibly switched to state STALE which can hang around a long time
- ruleset was installed, dropping NDP attempts. If there's no traffic, the client's NDP cache can stay in STALE state for a very long time.
- the first test on TCP 33333 used the STALE cache entry and succeeded. Observed return traffic might even delay the switch to PROBE state (so this first test can probably be repeated successfully for several seconds).
- the next test either did not manage to reach the target before the entry switched to PROBE state (NDP failed) and then to FAILED state (which deleted the MAC address), or managed this but was then anyway blocked as TCP 22222 traffic is blocked by the drop-all rule. Repeated attempts (incl. TCP SYN retries) without reply will speed up the switch to PROBE state.
- sooner or later the neighbor cache will have switched to PROBE and then FAILED state.
- the last test to TCP 33333 has no known MAC address to send IPv6 frames to: it can't succeed.
There's also a minor observed behavior difference when using an IPv6 link-layer address as destination here: when trying to reach a link-layer address (ie: an fe80::/10 address), contrary to an IPv6 global address destination or for the IPv4 case, it appears the client doesn't fail 3s later with EHOSTUNREACH
(No route to host): nothing happens for the client at all to warn it (tested with strace
).
So there's no way to distinguish if the failure is caused by NDP failure(which is an unintended effect) or by the remaining parts of the firewall: one might think the firewall suddenly behaves incorrectly, while it did not. The state changes in the NDP cache can take up to 30s.
This can be checked by using on the (Linux) client:
ip -6 neigh show fe80::9d08:b3e2:47fa:2935 dev ens33
which will likely display in the end:
fe80::9d08:b3e2:47fa:2935 dev ens33 FAILED
State change can be tracked in real-time with:
ip -ts -6 monitor neigh dev ens33