Score:0

High I/O wait and Context Switches - how to debug

my flag

Update:

sar -W -B -d output added to bottom of post.

I have simulated this on boot. I have uwsgi starting 9 instances of a django app, along with gunicorn for websockets, and dramatiq for queue handing. It seems like it swaps like crazy for the first 17 minutes on reboot. What makes no sense to me, I've got (watching it right now) 55gb free in memory, but it keeps piling stuff into swap.


Multiple global teams working on this server so trying to help them track down a strange situation. At some point I/O wait jumps way up and context switches quadruple. The instance starts crawling.

8 core - 64 GB machine, snapshot of sar around the time things go haywire. Any advice will be helpful.

12:00:02 AM    proc/s   cswch/s
...
04:20:01 AM      9.57   2240.70
04:30:01 AM      9.19   2205.21
04:40:01 AM     17.95   3654.56
04:50:01 AM     13.25   8211.17

04:50:01 AM    proc/s   cswch/s
05:00:01 AM     11.25  12730.44
05:10:01 AM     23.55  13373.36
05:20:01 AM      9.71  12946.54
05:30:01 AM      9.40  12910.49
05:40:01 AM      9.65  12756.83
05:50:01 AM      9.74  12240.25
06:00:01 AM      9.27  12499.49


04:20:01 AM     all      0.82      0.08      0.36      0.01      0.00     98.74
04:30:01 AM     all      0.74      0.08      0.34      0.01      0.00     98.82
04:40:01 AM     all      2.10      4.78      0.97      0.02      0.00     92.14
04:50:01 AM     all      5.75      1.34      0.89     14.13      0.00     77.89

04:50:01 AM     CPU     %user     %nice   %system   %iowait    %steal     %idle
05:00:01 AM     all      3.75      0.78      0.90     22.36      0.00     72.21
05:10:01 AM     all      1.34      0.08      1.06     16.47      0.00     81.06
05:20:01 AM     all      1.86      0.09      0.76     19.68      0.00     77.61
05:30:01 AM     all      1.59      0.09      0.75     29.20      0.00     68.36
05:40:01 AM     all      1.43      0.09      0.80     20.73      0.00     76.94
05:50:01 AM     all      1.50      0.08      0.72     18.37      0.00     79.33
06:00:01 AM     all      6.00      0.08      0.78     22.84      0.00     70.30


04:20:01 AM  40681728  23895344     37.00   1092056   8106304  18478612     23.63
04:30:01 AM  40624048  23953024     37.09   1092176   8111296  18494628     23.65
04:40:01 AM  39070316  25506756     39.50   1092452   8105988  19712452     25.20
04:50:01 AM  39020276  25556796     39.58   1092604   8109400  22042632     28.18

04:50:01 AM kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit
05:00:01 AM  42089076  22487996     34.82   1092724   6242232  21033836     26.89
05:10:01 AM  41149488  23427584     36.28   1092840   6244432  21054208     26.92
05:20:01 AM  40928964  23648108     36.62   1092956   6246076  20986812     26.83
05:30:01 AM  40650108  23926964     37.05   1093052   6260380  21890736     27.99
05:40:01 AM  40709564  23867508     36.96   1093124   6261400  21152980     27.05
05:50:01 AM  40510896  24066176     37.27   1093276   6262116  21860396     27.95
06:00:01 AM  40343000  24234072     37.53   1093364   6263852  23247668     29.73

11:00:01 AM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
11:10:01 AM  dev259-0     58.23    691.59    296.31     16.97      0.42      7.45      0.24      1.39
11:20:01 AM  dev259-0     25.78    592.32    104.80     27.05      0.02      0.67      0.04      0.11
11:30:01 AM  dev259-0      2.01      5.95     89.83     47.75      0.00      0.22      0.02      0.00
11:40:01 AM  dev259-0     86.93   1810.66     85.48     21.81      1.22     14.27      0.28      2.44
11:50:01 AM  dev259-0   2992.53  18668.89  16743.57     11.83     18.64      6.54      0.32     94.69
12:00:01 PM  dev259-0    873.25   6378.61   6783.46     15.07      4.04      4.91      0.30     25.95
12:10:01 PM  dev259-0   1960.41  15044.32  13876.68     14.75     13.52      7.20      0.31     60.85

12:10:01 PM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
12:20:01 PM  dev259-0   2861.73  16101.24  15620.46     11.08     17.03      6.26      0.32     91.00
12:30:01 PM  dev259-0   2328.61  15041.01  12711.45     11.92     14.54      6.55      0.32     73.74
12:40:01 PM  dev259-0     97.34    708.75    364.76     11.03      0.77      8.12      0.24      2.33
12:50:01 PM  dev259-0   2090.85  11775.92  11438.38     11.10     12.47      6.27      0.32     66.22
Average:     dev259-0    437.34   3139.50   2685.83     13.32      2.74      6.57      0.30     13.03
shodanshok avatar
ca flag
Please show the output of `sar -W -B -d` for the affected time range.
Tim Nelson avatar
my flag
Thanks. Updated post.
Score:0
my flag

OMG, I had special code in /etc/init.d/cgroup to configure cgroup depending on memory changes. AWS security update blew out my change. Mad swapping came from unintended memory limit from cgroup!!!

I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.