Since I started using openwrt on my router, something weird started to happen.
I usually have 4 devices (2 phones and 2 laptops) connected to the WiFi AP/router daily, but one of the laptops (namely a XPS 13 9365) started to suddenly get "disconnected". I've quoted the word because, in theory, I'm still connected, but network connection simply stops working.
It's weird because the issue simply doesn't show up some days, while other days are a real nightmare with the connection stopping working every couple minutes. And only for the XPS 13. Other devices work like a charm, even when I have ~10 devices connected at once.
This is what I get right after noticing network stops:
$ sudo iw dev "wlp60s0" link
Connected to **:**:**:**:**:** (on wlp60s0)
SSID: my_ap
freq: 2447
RX: 15583826 bytes (14173 packets)
TX: 1550845 bytes (6382 packets)
signal: -40 dBm
rx bitrate: 144.4 MBit/s MCS 15 short GI
tx bitrate: 144.4 MBit/s MCS 15 short GI
bss flags: short-preamble short-slot-time
dtim period: 2
beacon int: 100
And I still have an IP address etc.:
$ ip addr list
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
3: enx00e04c6810ec: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN group default qlen 1000
link/ether **:**:**:**:**:** brd ff:ff:ff:ff:ff:ff
5: wlp60s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether **:**:**:**:**:** brd ff:ff:ff:ff:ff:ff
inet 10.0.0.11/24 brd 10.0.0.255 scope global dynamic wlp60s0
valid_lft 43060sec preferred_lft 43060sec
inet6 fe80::fa63:3fff:fe2f:837/64 scope link
valid_lft forever preferred_lft forever
So, from the above, you can see I'm still connected to the AP and have a valid IP. But no matter who I try to ping, I get 100% packet loss. Other ways of connecting (like ssh, browser, etc.) also don't work. See:
$ ping 10.0.0.1
PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data.
^C
--- 10.0.0.1 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1011ms
I also tried to check for any system messages. No luck:
$ dmesg
$
Note: I issued sudo dmesg -c
right after boot to make it easier to identify issues and while the network was still usable.
I'm under Ubuntu 20.04.3:
$ cat /etc/issue
Ubuntu 20.04.3 LTS \n \l
My wireless device:
$ lspci | grep -i network
3c:00.0 Network controller: Intel Corporation Wireless 8265 / 8275 (rev 78)
As a temporary workaround, I developed a script to stop NetworkManager and reconnect via command line. Something like this:
iface="wlp60s0"
essid="my_ap"
tmpfile="/tmp/wpa.conf"
pass="my_pass"
sudo systemctl stop NetworkManager.service
sudo iw dev "$iface" del
sudo iw phy phy0 interface add "$iface" type managed
sudo ip link set "$iface" up
sudo wpa_passphrase "$essid" "$pass" > "$tmpfile"
sudo wpa_supplicant -i"$iface" -c"$tmpfile" -B
sudo dhclient -v "$iface"
This makes life a bit easier, but of course it's just temporary, rudimentary and far from ideal. And also it doesn't help much as I keep loosing connection from time to time anyway, exactly the same way as when I use NetworkManager. It's just quicker than waiting for NetworkManager to restart...
What I've tried so far
- Disabling wifi power_save with
sudo iw dev wlp60s0 set power_save off
.
- Disabling wifi power save via NetworkManager by editing
/etc/NetworkManager/conf.d/default-wifi-powersave-on.conf
and changing wifi.powersave = 3
to wifi.powersave = 2
then restarting. (source: https://unix.stackexchange.com/a/315400/108418)
- Changing wifi security on router (WPA -> WEP or other) (source: 20.04 can't connect to 5Ghz wifi after update)
- Changing wifi mode from "N" to "Legacy". This one seemed to solve the problem, but maybe because I didn't use it for long enough. Besides the network speed drop obviously makes this option impracticable.
- Enabling NetworkManager debug mode and trying to identify possible issues.
None of the above worked.
Other links I've visited
These were some of my tries, but either the symptoms are not exactly the same or the proposed solution didn't work for me...
https://www.reddit.com/r/linuxquestions/comments/ausg6k/arch_wifi_stays_connected_but_theres_no_internet/ehc3oph/
https://blog.stigok.com/2017/03/26/wifi-loses-connectivity-periodically-wpasupplicant-reason-4.html
So I'm posting all this here in the hope someone went through this already and maybe can shine some lights...
Thank you very much!
Update #1
I've found a way to reproduce the issue. Every time I visit this page and browse the photos (to make the browser load many photos at once, in parallel), the connection drops.
https://www.facebook.com/terraadentropelomundo/photos/
I wonder if there's any issues with the wireless driver in handling many connections at once.
Update #2
After browsing other forums in the hope for a solution, I came across this:
It seems to have become better when I changed "Beacon Interval" from
the default 100 ms to 50 on my AP. So far no disconnects in three
days.
EDIT: Can confirm, the problem appears to be fixed after this change.
(source: https://bugs.archlinux.org/task/58457#comment185619)
It makes sense, considering I started facing this problem after moving to openwrt on my AP. So there is certainly something weird with the Intel driver/firmware, but changing beacon on my AP seems to solve the issue. I'll test for some more days and see if the issue is gone.
Update #3
Didn't work. Even using beacon 50ms in openwrt, I'm still being disconnected from time to time without any messages showing up in dmesg
...