I have multiple ubuntu 22.04 servers with different NVIDIA cards, but none of them ever reaches higher than P2 power state and with this never reaches max clock speed.
I have tried a lot but nothing seems to convince the card to go higher.
root@pod0003:~# nvidia-smi -q -a
==============NVSMI LOG==============
Timestamp : Fri Jan 27 19:00:11 2023
Driver Version : 525.78.01
CUDA Version : 12.0
Attached GPUs : 1
GPU 00000000:05:00.0
Product Name : NVIDIA GeForce RTX 3060
Product Brand : GeForce
Product Architecture : Ampere
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Enabled
MIG Mode
Current : N/A
Pending : N/A
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : N/A
GPU UUID : GPU-1fbe1409-48f9-577f-c063-1e5d895d900b
Minor Number : 0
VBIOS Version : 94.06.4D.00.1B
MultiGPU Board : No
Board ID : 0x500
Board Part Number : N/A
GPU Part Number : 2544-302-A1
Module ID : 1
Inforom Version
Image Version : G001.0000.94.01
OEM Object : 2.0
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GSP Firmware Version : N/A
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x05
Device : 0x00
Domain : 0x0000
Device Id : 0x254410DE
Bus Id : 00000000:05:00.0
Sub System Id : 0x397D1462
GPU Link Info
PCIe Generation
Max : 4
Current : 4
Device Current : 4
Device Max : 4
Host Max : 4
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 32000 KB/s
Rx Throughput : 95000 KB/s
Atomic Caps Inbound : N/A
Atomic Caps Outbound : N/A
Fan Speed : 42 %
Performance State : P2
Clocks Throttle Reasons
Idle : Not Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 8192 MiB
Reserved : 218 MiB
Used : 5886 MiB
Free : 2087 MiB
BAR1 Memory Usage
Total : 8192 MiB
Used : 7 MiB
Free : 8185 MiB
Compute Mode : Default
Utilization
Gpu : 100 %
Memory : 91 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
SRAM Correctable : N/A
SRAM Uncorrectable : N/A
DRAM Correctable : N/A
DRAM Uncorrectable : N/A
Aggregate
SRAM Correctable : N/A
SRAM Uncorrectable : N/A
DRAM Correctable : N/A
DRAM Uncorrectable : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Remapped Rows : N/A
Temperature
GPU Current Temp : 62 C
GPU Shutdown Temp : 98 C
GPU Slowdown Temp : 95 C
GPU Max Operating Temp : 93 C
GPU Target Temperature : 83 C
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : Supported
Power Draw : 118.71 W
Power Limit : 170.00 W
Default Power Limit : 170.00 W
Enforced Power Limit : 170.00 W
Min Power Limit : 100.00 W
Max Power Limit : 170.00 W
Clocks
Graphics : 1957 MHz
SM : 1957 MHz
Memory : 7300 MHz
Video : 1717 MHz
Applications Clocks
Graphics : N/A
Memory : N/A
Default Applications Clocks
Graphics : N/A
Memory : N/A
Deferred Clocks
Memory : N/A
Max Clocks
Graphics : 2130 MHz
SM : 2130 MHz
Memory : 7501 MHz
Video : 1950 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : 1081.250 mV
Fabric
State : N/A
Status : N/A
Processes
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 9165
Type : G
Name : /usr/lib/xorg/Xorg
Used GPU Memory : 5 MiB
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 9902
Type : C
Name : /home/tk/jupyter/panenv/bin/python
Used GPU Memory : 2938 MiB
As can be seen in this log, there is no Throttle reason active. Yet I am stuck in Performance state P2 with clock speeds:
Graphics : 1957 MHz
SM : 1957 MHz
Memory : 7300 MHz
Video : 1717 MHz
While max would be:
Graphics : 2130 MHz
SM : 2130 MHz
Memory : 7501 MHz
Video : 1950 MHz
Most notably I tried:
X :0 &
export DISPLAY=:0
nvidia-settings -a "[gpu:0]/GpuPowerMizerMode=1"
But to no avail.
I use this card for tensorflow model training. The system is an Ryzen 12 core on an Gigabyte B550 with resizable bar enabled and PCIe gen 4, which can be seen is being used just fine. The power supply is 750 Watt
According to NVIDIA the P states mean this:
P0/P1 - Maximum 3D performance
P2/P3 - Balanced 3D performance-power
P8 - Basic HD video playback
P10 - DVD playback
P12 - Minimum idle power consumption
So what am I missing here?