One day last week or so, seemingly out of the blue, logged in sessions started to throw content encoding errors. The site still works fine for anonymous users.
There's nothing in error_log
, clearing caches and sessions didn't change anything and throttling the page loading speed shows it actually manages to draw the admin (and CiviCRM) toolbar and part of the page before giving up. The running PHP of course supports zlib and compression has been enabled for years. The dbs are acting normally.
The dev tools network tab appears to be useless when trying to determine which request response was bad. I've downloaded a few database snapshots and diffed them, but nothing stands out (though they're huge, so it's possible I missed something). I've made a minor version bump to Drupal to ensure the code is ok (8.9.20); no change. This thread had some helpful suggestions, but nothing brought me closer to finding the cause. One thing I haven't tried yet is to reimport older db backups to see if they work, to rule out a data issue. It's a heavy hammer, so I'm interested in:
How would you approach debugging this, pinpointing the source?
Edit: some more information: it's not a browser issue, since it didn't change and both firefox and chromium hit the same wall. It's hit on all pages, whether admin or on the frontend. When I say that the network tab is useless in identifying the failing request it's because not all are listed or if it's the main html one, the request headers are not set (even with persistent logs enabled). And compression clearly works, since some css files get through successfully before everything breaks.
I tried logging in via curl and comparing good and bad pages, but there was nothing obviously wrong in the requests upon diffing them. Just for posterity, the commands were:
curl -v -L $(drush uli --uid 99) --cookie-jar cookie.txt
curl --cookie cookie.txt -v --trace-ascii - good-url > good
curl --cookie cookie.txt -v --trace-ascii - bad-url > bad