I've enable transparent huge page in a process which uses jemalloc for memory allocation, by following steps:
- setting transparent huge page state to "madvice" :
echo madvise > /sys/kernel/mm/transparent_hugepage/enabled;
echo madvise > /sys/kernel/mm/transparent_hugepage/defrag;
2.setting jemalloc to always use thp
export MALLOC_CONF="thp:always,metadata_thp:always,dirty_decay_ms:-1";
Since the program only use jemalloc to allocate memory, the expected result should be that the size of memory used totally (RSS) is equal to the size of huge pages alloced. But it differs a lot, as items "AnonHugePages" and "Rss" show below :
# cat /proc/<pid>/smaps |awk 'NF==3 {dict[$1]+=$2} END{for(key in dict) print key" "dict[key]}'
Locked: 4
Shared_Clean: 18732
MMUPageSize: 8776
KernelPageSize: 8776
Pss: 150242778
Swap: 0
ShmemPmdMapped: 0
Shared_Dirty: 0
Size: 258068324
Private_Hugetlb: 0
Private_Dirty: 150234008
LazyFree: 0
Private_Clean: 124
Referenced: 147993656
VmFlags: 0
AnonHugePages: 76193792
Rss: 150252864
SwapPss: 0
Shared_Hugetlb: 0
Anonymous: 150232456
I know normal memory allocation (4k page) will occur if there is no enough huge page available in the operating system, adding one count to the item "thp_fault_fallback" in "/proc/vmstat". But the value is small as below snippet shows, meaning no so much non-huge-page-allocation happens:
# grep thp_fault_fallback /proc/vmstat
thp_fault_fallback 2982
So, Why the gap between the size of RSS and THP? Looking forward to some clues and advices.