👩🎓 TIL More About Network Namespaces
Today I learned
Well… Technically I learned about that more than 10 years ago. I just found this file sitting in some directory, and thought it was fun to publish it without further proof reading.
❯ ls -l chris-learned-netns.txt
-rw-r--r-- 1 chris chris 8431 Aug 28 2015 chris-learned-netns.txt
The idea was to run some systemd.service in a different network namespace. Basically I want to control the outbound ip of my mailserver stuff, without containering it entirely (no docker, systemd-nspawn, whatever). This is more or less philosopical stuff. IMHO there is a systemmanager… called systemd… taking care about my services. And I just want to run a few services. I’m still thinking about going for systemd-nspawn in case of my mailserver, to be at least able to use systemctl -M to connect from the outside to the inner systemd. Sure, a SNAT to dport 25 would do a good job as well. But for the fun of it.
However, this is absolutely a proof of concept, and most important, I learned lots of stuff. Oh. My focus was ipv6, but this works with ipv4 the same way. Pretty much of the stuff was stolen from the docker networking documentation https://docs.docker.com/articles/networking/#ipv6.
All the examples worked on debian/jessie and arch linux.
So, first of all I thought there may be a stanza in systemd.services to do this. But in fact, it is just possible to start services with lo only. Currently, there is PrivateNetwork=true and JoinsNamespaceOf=… But this is not general enough to join a specific network namespace. Have a look at: http://comments.gmane.org/gmane.comp.sysutils.systemd.devel/15892
However, this is not the most complicated thing with this idea. For postfix I just have to edit the (systemd auto generated) postfix.service, but more on this later. The more challenging stuff is to create the network namespace, assign an interface into it, assign an ip and get the routing stuff done. So, for reasons with my hoster, I wanted to stay with routed network instead of bridged stuff. Bridged stuff in terms of: one system bridge with eth0 assigned to it. I wanted to set public ips to the “containered” interface.
First of all, create a bridge anyway. To this, one part of the veth will be assigned. Install bridge-utils to get brctl.
# brctl addbr br0
# ip link add veth0 type veth peer name veth1
# ip a
...
6: veth1@veth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 92:40:3a:f7:d0:37 brd ff:ff:ff:ff:ff:ff
7: veth0@veth1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 7e:13:29:35:6c:f0 brd ff:ff:ff:ff:ff:ff
8: br0@NONE: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default
link/ether 2a:9b:07:6e:36:04 brd ff:ff:ff:ff:ff:ff
At this point, there are br0, veth0 and veth1 in the output of ip a. All DOWN. Assign veth0 to the bridge br0.
# brctl addif br0 veth0
# brctl show
bridge name bridge id STP enabled interfaces
br0 8000.06daacc6f10a no veth0
Bringing br0 and veth0 up. Depending on the driver, the devices are up instandly, or if ip addresses are assigned to. However.
# ip link set br0 up
# ip link set veth0 up
Next step is to create our network namespace and assign the other half of the veth into this namespace.
# ip netns add mailserverns
# ip link set veth1 netns mailserverns name eth0
Now there should be an interface in mailserverns and veth1 should be gone. (veth0 is still there!)
# ip a | grep veth1
# ip netns exec mailserverns ip a
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
6: eth0@if7: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 92:40:3a:f7:d0:37 brd ff:ff:ff:ff:ff:ff link-netnsid 0
We assigned veth1 as eth0 to mailserverns \o/
Before we start with the address and routing stuff, we should set some kernel parameters to enable forwarding and proxying arp and stuff for neighbor discovery
# sysctl -w net.ipv4.conf.all.forwarding=1
# sysctl -w net.ipv4.conf.all.proxy_arp=1
# sysctl -w net.ipv6.conf.all.forwarding=1
# sysctl -w net.ipv6.conf.eth0.proxy_ndp=1
Lets start with the host configuration. First of all, assign a lokal link address to br0 in order to use this as a default gateway from within the namespace. Again, I took this from the docker documentations.
# ip addr add fe80::1/64 dev br0
the route is generated automatically:
# ip -6 r s
fe80::/64 dev br0 proto kernel metric 256
Now… assuming, your ipv6 prefix is 2001:db8::/64, you have to slice it down. This is up to you. If you just need one address. Take a /128. I use 2001:db8::1ce:c01d:bee2/128 as an example.
In a different terminal, I just “join” into the network namespace. Just to only type the command instead of ip netns exec mailserverns COMMAND, each time.
# ip netns exec mailserverns bash
# (mailserverns) ip a
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
6: eth0@if7: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 92:40:3a:f7:d0:37 brd ff:ff:ff:ff:ff:ff link-netnsid 0
Enable the interface eth0. The route to fe80::/64 is created automatically.
# (mailserverns) ip link set eth0 up
# (mailserverns) ip -6 r s
fe80::/64 dev eth0 proto kernel metric 256
At this point, fe80::1 should be reachable
# (mailserverns) ping6 -I eth0 fe80::1
PING fe80::1(fe80::1) from fe80::9040:3aff:fef7:d037 eth0: 56 data bytes
64 bytes from fe80::1: icmp_seq=1 ttl=64 time=0.111 ms
So we can use this as the default gateway.
# (mailserverns) ip r add default via fe80::1 dev eth0
Lets assign the official address to our namespaced eth0
# (mailserverns) ip addr add 2001:db8::1ce:c01d:bee2/128 dev eth0
Thats pretty much it, from within the network namespace, even a ping is not yet working. Outside of it, in the “real” system, we have to tell the kernel about this ip.
# ip route add 2001:db8::1ce:c01d:bee2/128 dev br0
# ip -6 neigh add proxy 2001:db8::1ce:c01d:bee2 dev eth0
First one, tells where (on which bridge) the network will be found. This works for larger than /128 as well. But at least, you have to slice your host network! Second one, tells the host to proxy the neighbor 2001:db8::1ce:c01d:bee2 on hosts physical eth0. With this, the host says. “This ip belongs to my eth0”. This is NOT the eth0 of the namespace! Have a look at: IPv6 – Proxy the neighbors (or come back ARP – we loved you really) https://www.ipsidixit.net/2010/03/24/239/
Now. Ping from within the namespace should work.
# (mailserverns) ping6 google.com
PING google.com(fra02s27-in-x03.1e100.net) 56 data bytes
64 bytes from fra02s27-in-x03.1e100.net: icmp_seq=1 ttl=54 time=28.5 ms
Wohooo \o/
Testing a ping from another host in my network:
[mpd@leierkasten ~]$ ip -6 a s dev wlan0
3: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000
inet6 2001:db8::20f:54ff:fe0c:114b/64 scope global mngtmpaddr dynamic
valid_lft 6943sec preferred_lft 929sec
[mpd@leierkasten ~]$ ping6 2001:db8::1ce:c01d:bee2
PING 2001:db8::1ce:c01d:bee2(2001:db8::1ce:c01d:bee2) 56 data bytes
64 bytes from 2001:db8::1ce:c01d:bee2: icmp_seq=1 ttl=62 time=456 ms
ATTENTION: None of the above is persistent in any way!
So close. Now lets customize the postfix.service and extend each start-stop-call by ip netns exec mailserverns.
# systemctl stop postfix
# cp /run/systemd/generator.late/postfix.service /etc/systemd/system/postfix.service
# vi /etc/systemd/system/postfix.service
...
ExecStart=/bin/ip netns exec mailserverns /etc/init.d/postfix start
ExecStop=/bin/ip netns exec mailserverns /etc/init.d/postfix stop
ExecReload=/bin/ip netns exec mailserverns /etc/init.d/postfix reload
...
# systemctl daemon-reload
# systemctl start postfix
Verify with netstat. No :25 should be listed.
# netstat -lntp
...
Since it is running in the network namespace.
# ip netns exec mailserver netstat -lntp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:587 0.0.0.0:_ LISTEN 9473/master
tcp 0 0 0.0.0.0:25 0.0.0.0:_ LISTEN 9473/master
tcp6 0 0 :::587 :::_ LISTEN 9473/master
tcp6 0 0 :::25 :::_ LISTEN 9473/master