I apologize wholeheartedly but this is probably as unspecified as it can get as for a question.
I run a linux VPS and...I think it has an important performance problem, but I can't pin point what it is or could be. I contacted support but they say they don't see any problems. For further support I have to pay.
I am an experienced software engineer with some devops insights. I'd like to first find out myself as much as I can.
Most obvious symptoms:
- Logging in via ssh, executions in the shell are slow. Much of this can be attributed to latency due to location (I am in South America, the server is located in Europe). But not all of it, because:
- Sometimes, especially if I execute CPU hungry stuff, I can literally observe as if the process is starved of CPU. It just stops, like it's stuttering (maybe they are throttling it or some other things?), and then continues. The processing phase after
sudo apt full-upgrade
takes a very long time and doesn't look like going smoothly.
- I also run a web server there and sometimes the response is very quick, but timeouts are frequent (like, when I run a nextcloud update via browser, I have to reload after every step because it looses the connection).
Some info:
uname -a
Linux 4.15.0-147-generic #151-Ubuntu SMP Fri Jun 18 19:21:19 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
HW:
CPU Information
Name QEMU Virtual version 2.5+
Topology 4 Processors, 4 Cores
Base Frequency 3.50 GHz
L1 Instruction Cache 32.0 KB x 1
L1 Data Cache 32.0 KB x 1
L2 Cache 4.00 MB x 1
L3 Cache 16.0 MB x 1
Memory Information
Memory 7.60 GB
Running sysbench on cpu:
sysbench --test=cpu run
WARNING: the --test option is deprecated. You can pass a script name or path on the command line without any options.
sysbench 1.0.11 (using system LuaJIT 2.1.0-beta3)
Running the test with following options:
Number of threads: 1
Initializing random number generator from current time
Prime numbers limit: 10000
Initializing worker threads...
Threads started!
CPU speed:
events per second: 1094.64
General statistics:
total time: 10.0008s
total number of events: 10949
Latency (ms):
min: 0.83
avg: 0.91
max: 101.27
95th percentile: 0.94
sum: 9991.70
Threads fairness:
events (avg/stddev): 10949.0000/0.00
execution time (avg/stddev): 9.9917/0.00
Geekbench score seems VERY low compared with other stuff I have seen there.
515
Single-Core Score
1629
Multi-Core Score
Full geekbench output: https://browser.geekbench.com/v5/cpu/12431904
I'd appreciate immensely any hint or suggestion. Happy to provide more info if needed.
EDIT: Thanks to the comment below I checked for steal time. It does seem to occasionally spike to 4.8% max, but most of the time it is around 0.1 so I don't think that's really the culprit.
What is weird is that while the CPU seems to be at 100% if I run a CPU intensive process, idle amount still reports very high and user amount seems low. There is also repeatedly the khugepaged spiking which I have no idea how to interpret. I guess continue to debug...