Use wondershaper in Docker Swarm.

Tool

  • A useful tool: wondershaper
    • Wonder Shaper is a script that allow the user to limit the bandwidth of one or more network adapters. It does so by using iproute's tc command, but greatly simplifies its operation.

Normal Container

  • A normal container’s network is in an absolute namespace, we can verify this by below instructions.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    docker exec -it ${container_id} /bin/bash
    ip addr
    ---
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
    valid_lft forever preferred_lft forever
    2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN qlen 1
    link/ipip 0.0.0.0 brd 0.0.0.0
    16042: eth0@if16043: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP
    link/ether 02:42:ac:13:00:04 brd ff:ff:ff:ff:ff:ff
    inet 172.19.0.4/16 scope global eth0
    valid_lft forever preferred_lft forever
    inet6 fe80::42:acff:fe13:4/64 scope link
    valid_lft forever preferred_lft forever
    ---

    ifconfig
    ---
    eth0 Link encap:Ethernet HWaddr 02:42:AC:13:00:04
    inet addr:172.19.0.4 Bcast:0.0.0.0 Mask:255.255.0.0
    inet6 addr: fe80::42:acff:fe13:4/64 Scope:Link
    UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
    RX packets:159995 errors:0 dropped:0 overruns:0 frame:0
    TX packets:318058 errors:0 dropped:0 overruns:0 carrier:0
    collisions:0 txqueuelen:0
    RX bytes:13868114 (13.2 MiB) TX bytes:23103415 (22.0 MiB)

    lo Link encap:Local Loopback
    inet addr:127.0.0.1 Mask:255.0.0.0
    inet6 addr: ::1/128 Scope:Host
    UP LOOPBACK RUNNING MTU:65536 Metric:1
    RX packets:39 errors:0 dropped:0 overruns:0 frame:0
    TX packets:39 errors:0 dropped:0 overruns:0 carrier:0
    collisions:0 txqueuelen:1
    RX bytes:2994 (2.9 KiB) TX bytes:2994 (2.9 KiB)

    In host:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    ifconfig
    ---
    br-2555d002131c: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
    inet 172.19.0.1 netmask 255.255.0.0 broadcast 0.0.0.0
    inet6 fe80::42:afff:fe8a:31a4 prefixlen 64 scopeid 0x20<link>
    ether 02:42:af:8a:31:a4 txqueuelen 0 (Ethernet)
    RX packets 0 bytes 0 (0.0 B)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 0 bytes 0 (0.0 B)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
    ---

    ip link
    ---
    16043:
    veth429e849@if16042: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-2555d002131c state UP mode DEFAULT
    link/ether ee:e7:70:1b:b8:d2 brd ff:ff:ff:ff:ff:ff link-netnsid 28
    ---

    ethtool -S veth429e849
    ---
    NIC statistics:
    peer_ifindex: 16042
    ---

    route
    ---
    172.19.0.0 0.0.0.0 255.255.0.0 U 0 0 0 br-2555d002131c
  • eth0@16042 in container and veth@16043 in host is a pair

  • bridge mode: the veth in host connect to a bridge, if we connect several veth to the same bridge.

  • Container set its default dev as eth0

  • host insert a record into the route table: xxx.xxx.xxx.xxx/xx -> br-xx

Interact with external network

  • MASQUERADE策略用来将请求包中的源地址转换成一个网络设备的地址,当容器需要和宿主机的外部网络通信,那么ip需要做一次转换。
    • sudo sysctl -w net.ipv4.conf.all.forwarding=l
    • sudo iptables -t nat -A POSTROUTING -s 172.19.0.0/24 -o ethO -j MASQUERADE
  • DNAT则用来映射容器的服务到host ip上,也就是通常的port映射。
    • sudo iptables -t nat -A PREROUTING -p tcp -m tcp --dport 80 -j DNAT --to- destination 172.19.0.2 : 80

Overlay

  • This case is the same as a container joinning different bridges.

  • In container, there are two veths. One connects the host and another connects the overlay network.

  • It’s interesting that overlay network has its own namespace.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    docker network ls
    NETWORK ID NAME DRIVER SCOPE
    8af77005fccd bridge bridge local
    0a9f19fcc8f6 docker_gwbridge bridge local
    4ff101446cb3 host host local
    lcm7v5396eh8 ingress overlay swarm
    j1fvxxfk00l1 net1 overlay swarm
    bc58f94703ff none null local

    nsenter --net=/var/run/docker/netns/1-j1fvxxfk00 ifconfig
    veth73: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
    ether 3a:6c:9f:88:87:f1 txqueuelen 0 (Ethernet)
    RX packets 10034 bytes 56493239 (53.8 MiB)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 10498 bytes 3671156 (3.5 MiB)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
  • nsenter helps us to enter a network namespace (placed in /var/run/docker/netns/).

Solution

  • Finally, we found the veth-peer which connects the container to overlay net. We can limit the bandwidth in specific namespace. Since docker doesn’t create a symlink in /var/run/netns for its netns, we can’t use ip. Here is a method to fix this.
1
2
3
4
5
PID=$(docker inspect -f '{{.State.Pid}}' $CONTAINER_ID)
ln -sfT /proc/$PID/ns/net /var/run/netns/$CONTAINER_ID
ip netns exec $CONTAINER_ID ip link add ifb0 type ifb
ip netns exec $CONTAINER_ID ip link set dev ifb0 up
ip netns exec $CONTAINER_ID ./wondershaper -a eth0 -u $U_LIMIT -d $D_LIMIT
  • nsenter is easier to use.
1
2
3
nsenter /var/run/netns/$CONTAINER_ID ip link add ifb0 type ifb
nsenter /var/run/netns/$CONTAINER_ID ip link set dev ifb0 up
nsenter /var/run/netns/$CONTAINER_ID ./wondershaper -a eth0 -u $U_LIMIT -d $D_LIMIT