CPU utilization ratios rarely tell the entire story. Some tools count fractions of 1 CPU, some fractions of all system CPUs. The total does not explain what the CPU/memory load is doing, it could be doing useful work, it could be system overhead.
A better test would quantify the amount of work done by some application, while pushing the system hard. And profile the system while this happens. Stress test microbenchmarks create load on demand, even if these "bogus" operations are synthetic and a little artificial.
For example, on Linux perf record -a -F 999 -- stress-ng --metrics --cpu 1 --timeout 1m
will create one minute of CPU load while profiling what is on any CPU at millisecond precision. "bogo ops" quantify work done.
Repeat this test multiple times, in a VM, and on a bare metal OS install. Remember scientific method, keep as many variables the same as possible. Identical hardware, same OS distro and patch level, same workload parameters, same profiling parameters.
Review what is on CPU with perf report
Bucketing these into kernel, drivers, and the application requires installing debug symbols, and some research into what various functions are doing. Hypervisor overhead will not appear in the guest, but can be inferred by any throughput difference with the bare metal results.
Synthetic testing will be very different from real workloads. My stress-ng example does zero I/O, which never happens in practice. Do similar profiling and analysis, but the workload is what the host is supposed to be doing. Hint: when profiling the entire system, but not needing to start another program, can have sleep run a timer: perf record -a -- sleep 60