Score:2

Loading an eBPF program causes IRQ affinities to be modified - (ixgbe driver)

nl flag

I am working on an eBPF/XDP application running on a server with Intel 10G X550T NICs, using the ixgbe driver.

I need precise control over how the work is distributed between cores, so I'm disabling irqbalance and setting IRQ affinity manually. I wrote a brief python script to read /proc/interrupts and /proc/irq/X/smp_affinity to show which CPU cores should be handling the interrupt for each queue:

The int0 NIC is configured with 40 queues, and the machine has 40 cores - after running my manual configuration, the queue->core mapping looks like this:

# python3 show_ints.py
int0       : [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39]

When I load even the most trivial eBPF program on this device, however:

int xdp_sock_prog(struct xdp_md *ctx)
{
    return( XDP_PASS );
}
# xdp-loader load -m native int0 test.o

the irq affinity seems to be modified:

# python3 show_ints.py
int0       : [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39]

This would not in itself be a problem - just a reshuffling of cores - but there are multiple NICs in this machine, and every time I attempt to directly assign specific cores to handle specific queues on specific NICs, the eBPF load messes things up in a way that always results in multiple NICs hitting the same cores (which I don't want!).

Is this expected behaviour? Is there a way to disable it?

Edit (additional information):

The IRQs themselves do not change...

Before:

int0       IRQs : [79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181]
int0       CPUs : [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39]

After:

int0       IRQs : [79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181]
int0       CPUs : [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39]

So it's just the smp_affinity that's getting tweaked.

Apologies for not including version info - this is 5.15.0-79 generic on Ubuntu 22.04.

Further edit:

Hmmm... when the eBPF program is loaded, dmesg shows:

[66216.150088] ixgbe 0000:3b:00.0: removed PHC on int0
[66216.927782] ixgbe 0000:3b:00.0: registered PHC device on int0
[66221.735857] ixgbe 0000:3b:00.0 int0: NIC Link is Up 10 Gbps, Flow Control: None
Anton Danilov avatar
cn flag
Check the irq numbers of network interface before and after loading of eBPF program. Show the version of driver. Also check the dmesg output.
Strags avatar
nl flag
Hi Anton, edited post to include requested data.
Anton Danilov avatar
cn flag
It seems like the driver reinitializes the NIC after eBPF program load. I'm reading the source code of the driver to investigate this behaviour. But I'm afraid it's impossible to mitigate this issue.
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.