Thread (24 messages) 24 messages, 4 authors, 2022-02-22

Re: [net-next v8 2/2] net: sched: support hash/classid/cpuid selecting tx queue

From: Jamal Hadi Salim <jhs@mojatatu.com>
Date: 2022-02-22 11:44:40

On 2022-02-20 20:43, Tonghao Zhang wrote:
On Mon, Feb 21, 2022 at 2:30 AM Jamal Hadi Salim [off-list ref] wrote:
quoted
On 2022-02-18 07:43, Tonghao Zhang wrote:
quoted
On Thu, Feb 17, 2022 at 7:39 AM Jamal Hadi Salim [off-list ref] wrote:
quoted



quoted
Thats a different use case than what you are presenting here.
i.e the k8s pod scenario is purely a forwarding use case.
But it doesnt matter tbh since your data shows reasonable results.

[i didnt dig into the code but it is likely (based on your experimental
data) that both skb->l4_hash and skb->sw_hash  will _not be set_
and so skb_get_hash() will compute the skb->hash from scratch.]
No, for example, for tcp, we have set hash in __tcp_transmit_skb which
invokes the skb_set_hash_from_sk
so in skbedit, skb_get_hash only gets skb->hash.
There is no tcp anything in the forwarding case. Your use case was for
forwarding. I understand the local host tcp/udp variant.
quoted
quoted
quoted
I may be missing something on the cpuid one - seems high likelihood
of having the same flow on multiple queues (based on what
raw_smp_processor_id() returns, which i believe is not guaranteed to be
consistent). IOW, you could be sending packets out of order for the
same 5 tuple flow (because they end up in different queues).
Yes, but think about one case, we pin one pod to one cpu, so all the
processes of the pod will
use the same cpu. then all packets from this pod will use the same tx queue.
To Cong's point - if you already knew the pinned-to cpuid then you could
just as easily set that queue map from user space?
Yes, we can set it from user space. If we can know the cpu which the
pod uses, and select the one tx queue
automatically in skbedit, that can make the things easy?
Yes, but you know the CPU - so Cong's point is valid. You knew the
CPU when you setup the cgroup for iperf by hand, you can use the
same hand to set the queue map skbedit.
quoted
quoted
ip li set dev $NETDEV up

tc qdisc del dev $NETDEV clsact 2>/dev/null
tc qdisc add dev $NETDEV clsact

ip link add ipv1 link $NETDEV type ipvlan mode l2
ip netns add n1
ip link set ipv1 netns n1

ip netns exec n1 ip link set ipv1 up
ip netns exec n1 ifconfig ipv1 2.2.2.100/24 up

tc filter add dev $NETDEV egress protocol ip prio 1 \
flower skip_hw src_ip 2.2.2.100 action skbedit queue_mapping hash-type cpuid 2 6

tc qdisc add dev $NETDEV handle 1: root mq

tc qdisc add dev $NETDEV parent 1:1 handle 2: htb
tc class add dev $NETDEV parent 2: classid 2:1 htb rate 100kbit
tc class add dev $NETDEV parent 2: classid 2:2 htb rate 200kbit

tc qdisc add dev $NETDEV parent 1:2 tbf rate 100mbit burst 100mb latency 1
tc qdisc add dev $NETDEV parent 1:3 pfifo
tc qdisc add dev $NETDEV parent 1:4 pfifo
tc qdisc add dev $NETDEV parent 1:5 pfifo
tc qdisc add dev $NETDEV parent 1:6 pfifo
tc qdisc add dev $NETDEV parent 1:7 pfifo

set the iperf3 to one cpu
# mkdir -p /sys/fs/cgroup/cpuset/n0
# echo 4 > /sys/fs/cgroup/cpuset/n0/cpuset.cpus
# echo 0 > /sys/fs/cgroup/cpuset/n0/cpuset.mems
# ip netns exec n1 iperf3 -c 2.2.2.1 -i 1 -t 1000 -P 10 -u -b 10G
# echo $(pidof iperf3) > /sys/fs/cgroup/cpuset/n0/tasks

# ethtool -S eth0 | grep -i tx_queue_[0-9]_bytes
       tx_queue_0_bytes: 7180
       tx_queue_1_bytes: 418
       tx_queue_2_bytes: 3015
       tx_queue_3_bytes: 4824
       tx_queue_4_bytes: 3738
       tx_queue_5_bytes: 716102781 # before setting iperf3 to cpu 4
       tx_queue_6_bytes: 17989642640 # after setting iperf3 to cpu 4,
skbedit use this tx queue, and don't use tx queue 5
       tx_queue_7_bytes: 4364
       tx_queue_8_bytes: 42
       tx_queue_9_bytes: 3030


# tc -s class show dev eth0
class mq 1:1 root leaf 2:
   Sent 9874 bytes 63 pkt (dropped 0, overlimits 0 requeues 0)
   backlog 0b 0p requeues 0
class mq 1:2 root leaf 8001:
   Sent 418 bytes 3 pkt (dropped 0, overlimits 0 requeues 0)
   backlog 0b 0p requeues 0
class mq 1:3 root leaf 8002:
   Sent 3015 bytes 13 pkt (dropped 0, overlimits 0 requeues 0)
   backlog 0b 0p requeues 0
class mq 1:4 root leaf 8003:
   Sent 4824 bytes 8 pkt (dropped 0, overlimits 0 requeues 0)
   backlog 0b 0p requeues 0
class mq 1:5 root leaf 8004:
   Sent 4074 bytes 19 pkt (dropped 0, overlimits 0 requeues 0)
   backlog 0b 0p requeues 0
class mq 1:6 root leaf 8005:
   Sent 716102781 bytes 480624 pkt (dropped 0, overlimits 0 requeues 0)
   backlog 0b 0p requeues 0
class mq 1:7 root leaf 8006:
   Sent 18157071781 bytes 12186100 pkt (dropped 0, overlimits 0 requeues 18)
   backlog 0b 0p requeues 18
class mq 1:8 root
   Sent 4364 bytes 26 pkt (dropped 0, overlimits 0 requeues 0)
   backlog 0b 0p requeues 0
class mq 1:9 root
   Sent 42 bytes 1 pkt (dropped 0, overlimits 0 requeues 0)
   backlog 0b 0p requeues 0
class mq 1:a root
   Sent 3030 bytes 13 pkt (dropped 0, overlimits 0 requeues 0)
   backlog 0b 0p requeues 0
class tbf 8001:1 parent 8001:

class htb 2:1 root prio 0 rate 100Kbit ceil 100Kbit burst 1600b cburst 1600b
   Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
   backlog 0b 0p requeues 0
   lended: 0 borrowed: 0 giants: 0
   tokens: 2000000 ctokens: 2000000

class htb 2:2 root prio 0 rate 200Kbit ceil 200Kbit burst 1600b cburst 1600b
   Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
   backlog 0b 0p requeues 0
   lended: 0 borrowed: 0 giants: 0
   tokens: 1000000 ctokens: 1000000
Yes, if you pin a flow/process to a cpu - this is expected. See my
earlier comment. You could argue that you are automating things but
it is not as a strong as the hash setup (and will have to be documented
that it works only if you pin processes doing network i/o to cpus).
Ok, it should be documented in iproute2. and we will doc this in
commit message too.
I think this part is iffy. You could argue automation pov
but i dont see much else.
quoted
Could you also post an example on the cgroups classid?
The setup commands:
NETDEV=eth0
ip li set dev $NETDEV up

tc qdisc del dev $NETDEV clsact 2>/dev/null
tc qdisc add dev $NETDEV clsact

ip link add ipv1 link $NETDEV type ipvlan mode l2
ip netns add n1
ip link set ipv1 netns n1

ip netns exec n1 ip link set ipv1 up
ip netns exec n1 ifconfig ipv1 2.2.2.100/24 up

tc filter add dev $NETDEV egress protocol ip prio 1 \
flower skip_hw src_ip 2.2.2.100 action skbedit queue_mapping hash-type
classid 2 6

tc qdisc add dev $NETDEV handle 1: root mq

tc qdisc add dev $NETDEV parent 1:1 handle 2: htb
tc class add dev $NETDEV parent 2: classid 2:1 htb rate 100kbit
tc class add dev $NETDEV parent 2: classid 2:2 htb rate 200kbit

tc qdisc add dev $NETDEV parent 1:2 tbf rate 100mbit burst 100mb latency 1
tc qdisc add dev $NETDEV parent 1:3 pfifo
tc qdisc add dev $NETDEV parent 1:4 pfifo
tc qdisc add dev $NETDEV parent 1:5 pfifo
tc qdisc add dev $NETDEV parent 1:6 pfifo
tc qdisc add dev $NETDEV parent 1:7 pfifo

setup classid
# mkdir -p /sys/fs/cgroup/net_cls/n0
# echo 0x100001 > /sys/fs/cgroup/net_cls/n0/net_cls.classid
# echo $(pidof iperf3) > /sys/fs/cgroup/net_cls/n0/tasks

I would say some thing here as well. You know the classid, you manually
set it above, you could have said:

src_ip 2.2.2.100 action skbedit queue_mapping 1

cheers,
jamal
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help