I'm having trouble ensuring a required network throughput on a server connected to a Signal Hound spectrum analyzer via a 10GbE network interface. Basically, I can get good throughput when only the radio capture process is running, but when I run other processes, the throughput starts to drop. I'm using an Aquantia PCIe ethernet adapter with a QNAP SFP+ 10GbE Thunderbolt 3 adapter.
When I'm running a simple python program to poll from the spectrum analyzer API in streaming mode, it all works great at the maximum bandwidth (~800MB/s). When I do
$ stress --cpu 8 --io 8 --vm 8 --hdd 8
side-by-side, it lowers to about 600MB/s and I start dropping a lot of data.
Things I've tried:
- Updating drivers
- Messing with the coalescing parameters and many ethtool options (MTU, etc)
- Turning off hyperthreading and isolating the process to a single core (8 of 8) via cpu affinity pinning
- This also involved isolating the networking interrupts to their own core (7 of 8)
- I also change the core governor to be "performance" so it's always at maximum freq
- I also tried turning off most of the other interrupts for cores 7 and 8 to prevent them from slowing down, verified by a netdata dashboard
- I basically tried everything in here
Essentially, I know that it can run in real-time because it works fine when it's by itself confined to 2 cores. But for some reason, even though the other cores don't interfere with the CPU cycles or network IRQs, when cores 1-6 are at heavy load, they slow the main process down greatly.
If it helps, I find that the --vm 4
option for stress
causes the most slowdown, so I suspect that it has something to do with memory allocation and perhaps the DRAM interface to the network card.
I'm basically pulling my hair out trying to get every packet from the radio on a (what should be very powerful) Ubuntu 20.04 machine. Does anyone have any experience with applications like this?
EDIT: I copied some of the performance curves here:
Here is the effect I'm seeing
So here's the utilization. Core 6 is at 100% with softirqs both during the high stress period and the "just capturing" period. I've tried splitting the network data onto two cores (5 and 6), but one of them always stays loaded while the other one seems clear, even if they have similar amounts of interrupts.
The actual number of softirqs unfortunately drops on CPU 6 during the period when the stress test is running.
Here is the effect I'm seeing on CPU6 softnet.
Also, the interrupts seem to stay relatively the same, though they get a little less consistent during the high stress period.
Here's the straight network speed, and it looks a little inconsistent as well in both periods.
I was looking pretty closely for anomalies (though there are a lot of plots in netstat), and it looks like there is no interprocess memory during the high stress period. Could this lead to issues?
If anyone needs more plots, let me know. I can't deduce the issue from these, but I hope it's enough information to come up with potential solutions.
Thanks again!