I have some more or less complex microservice architecture, where Apache Ignite is used as a stateless database / cache. The Ignite Pod
is the only Pod
in its Namespace
and the architecture has to pass a security audit, which it won't pass if I don't apply the most restrictive NetworkPolicy
possible for egress
traffic. It has to restrict all possible traffic that is not needed by Ignite itself.
At first, I thought: Nice, Ignite does not push any traffic to other Pod
s (there are no other pods in that Namespace
), so this is gonna be easily done restricting all egress
traffic in the Namespace
where Ignite is the only Pod
! ...
Well, that didn't actually work out great:
Any egress
rule, even if I allow traffic to all the ports mentioned in the Ignite Documentation, will cause the startup to fail with an IgniteSpiException
that says Failed to retrieve Ignite pods IP addresses, Caused by: java.net.ConnectException: Operation timed out (Connection timed out)
.
The problem seems to be the TcpDiscoveryKubernetsIpFinder
, especially the method getRegisteredAddresses(...)
which obviously does some egress traffic inside the Namespace
in order to register IP addresses of Ignite nodes. The disovery port 47500 is of course allowed, but that does not change the situation. The functionality of Ignite with the other Pod
s from other Namespace
s is working without egress
rules applied, which means (to me) that the configuration concerning ClusterRole
, ClusterRoleBinding
, a Service
in the Namespace
and the xml configuration of Ignite itself etc. seems to be correct. Even ingress
rules restricting traffic from other namespaces are working as expected, allowing exactly the desired traffic.
These are the policies I applied:
[WORKING, blocking undesired traffic only]:
## Denies all Ingress traffic to all Pods in the Namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: deny-all-ingress-in-cache-ns
namespace: cache-ns
spec:
# selecting nothing here will deny all traffic between pods in the namespace
podSelector:
matchLabels: {}
# traffic routes to be considered, here: incoming exclusively
policyTypes:
- Ingress
## Allows necessary ingress traffic
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: netpol-cache-ns
namespace: cache-ns
# defines the pod(s) that this policy is targeting
spec:
policyTypes:
- Ingress
podSelector:
matchLabels:
app: ignite
# <----incoming traffic----
ingress:
- from:
- namespaceSelector:
matchLabels:
zone: somewhere-else
podSelector:
matchExpressions:
- key: app
operator: In
values: [some-pod, another-pod] # dummy names, these Pods don't matter at all
ports:
- port: 11211 # JDBC
protocol: TCP
- port: 47100 # SPI communication
protocol: TCP
- port: 47500 # SPI discovery (CRITICAL, most likely...)
protocol: TCP
- port: 10800 # SQL
protocol: TCP
# ----outgoing traffic---->
# NONE AT ALL
With these two applied, everything is working fine, but the security audit will say something like
Where are the restrictions for egress
? What if this node is hacked via the allowed routes because one of the Pods using these routes was hacked before? It may call a C&C server then! This configuration will not be permitted, harden your architecture!
[BLOCKING desired/necessary traffic]:
Generally deny all traffic...
## Denies all traffic to all Pods in the Namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: deny-all-traffic-in-cache-ns
namespace: cache-ns
spec:
# selecting nothing here will deny all traffic between pods in the namespace
podSelector:
matchLabels: {}
# traffic routes to be considered, here: incoming exclusively
policyTypes:
- Ingress
- Egress # <------ THIS IS THE DIFFERENCE TO THE WORKING ONE ABOVE
... and allow specific routes afterwards
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: netpol-cache-ns-egress
namespace: cache-ns
# defines the pod(s) that this policy is targeting
spec:
policyTypes:
- Egress
podSelector:
matchLabels:
app: ignite
----outgoing traffic---->
egress:
# [NOT SUFFICIENT]
# allow egress to this namespace at specific ports
- to:
- namespaceSelector:
matchLabels:
zone: cache-zone
ports:
- protocol: TCP
port: 10800
- protocol: TCP
port: 47100 # SPI communication
- protocol: TCP
port: 47500
# [NOT SUFFICIENT]
# allow dns resolution in general (no namespace or pod restriction)
- ports:
- protocol: TCP
port: 53
- protocol: UDP
port: 53
# [NOT SUFFICIENT]
# allow egress to the kube-system (label is present!)
- to:
- namespaceSelector:
matchLabels:
zone: kube-system
# [NOT SUFFICIENT]
# allow egress in this namespace and for the ignite pod
- to:
- namespaceSelector:
matchLabels:
zone: cache-zone
podSelector:
matchLabels:
app: ignite
# [NOT SUFFICIENT]
# allow traffic to the IP address of the ignite pod
- to:
- ipBlock:
cidr: 172.21.70.49/32 # won't work well since those addresses are dynamic
ports:
- port: 11211 # JDBC
protocol: TCP
- port: 47100 # SPI communication
protocol: TCP
- port: 47500 # SPI discovery (CRITICAL, most likely...)
protocol: TCP
- port: 49112 # JMX
protocol: TCP
- port: 10800 # SQL
protocol: TCP
- port: 8080 # REST
protocol: TCP
- port: 10900 # thin clients
protocol: TCP
Apache Ignite version used is 2.10.0
Now the question to all readers is:
How can I restrict Egress
to an absolute minimum inside the Namespace
so that Ignite starts up and works correctly? Would it be sufficient to just deny Egress
to outside the cluster?
If you need any more yaml
s for an educated guess or hint, please feel free to request them in a comment.
And sorry for the tagging if it seems inappropriate, I couldn't find the tag kubernetes-networkpolicy
as it is present on stackoverflow.
UPDATE:
Executing nslookup -debug kubernetes.default.svc.cluster.local
from inside the ignite pod wihtout any policy restricting egress
shows
BusyBox v1.29.3 (2019-01-24 07:45:07 UTC) multi-call binary.
Usage: nslookup HOST [DNS_SERVER]
Query DNS about HOST
As soon as (any) NetworkPolicy
is applied that restricts Egress
to specific ports, pods and namespaces the Ignite pod refuses to start and the lookup does not reach kubernetes.default.svc.cluster.local
anymore.
Egress
to DNS was allowed (UDP 53 to k8s-app: kube-dns) ⇒ still no ip lookup possible