High load on Ubiquiti Nanostation XM - maybe related to "workingset_refault" (vmstat)
Username: Lars
Origin: https://bugs.openwrt.org/index.php?do=details&task_id=1544
Our local wireless community uses a lot of Ubiquiti devices.
They all worked well with Chaos Calmer.
With LEDE 17.01 we started to see load issues with Nanostation M5 XM devices (the older Nanostation model, only 32
MB
). We did not notice the issue with any other device up to now.
After a few hours of uptime the routers will start to develop persistent high load (>8) and usually “recover” only after a reboot. “wifi up/down” do not seem to affect the issue.
The problem is almost non-existing for devices using only a single ethernet port. Devices using both ethernet ports suffer greatly (problems starting usually within 24 hours). Thus I could imagine, that
issue #296
is related (just wild guessing).
Traffic on the wireless interface seems to increase the likelyhood of the problem (maybe CPU utilization in general).
“top” and other tools do not show processes, that could cause the high load.
The only unusual metric that seems to be connected to the high-load situation seems to be “workingset_refault” (see /proc/vmstat).
See the following output:
root@AP-1-96:~# while sleep 10; do grep workingset_ /proc/vmstat; done workingset_refault 1304983 workingset_activate 392198 workingset_nodereclaim 10330 workingset_refault 1308585 workingset_activate 393391 workingset_nodereclaim 10352 workingset_refault 1308671 workingset_activate 393412 workingset_nodereclaim 10352 workingset_refault 1310284 workingset_activate 393940 workingset_nodereclaim 10374 workingset_refault 1317360 workingset_activate 396226 workingset_nodereclaim 10454 workingset_refault 1317465 workingset_activate 396251 workingset_nodereclaim 10454 workingset_refault 1317540 workingset_activate 396292 workingset_nodereclaim 10454 workingset_refault 1324449 workingset_activate 398402 workingset_nodereclaim 10508 workingset_refault 1328418 workingset_activate 399908 workingset_nodereclaim 10536 workingset_refault 1328796 workingset_activate 400114 workingset_nodereclaim 10536 workingset_refault 1329186 workingset_activate 400213 workingset_nodereclaim 10546 workingset_refault 1333889 workingset_activate 401528 workingset_nodereclaim 10594
Above you see 13k “workingset_refault” events within 60 seconds. The “workingset_refault” value stays at zero for routers with the same kernel, that do now show this problem. Thus I could imagine, that this is related to the high load.
Now I am running out of ideas, how to research the issue. Maybe someone can give me a hint, what I could try?
Just for reference: we are also discussing this issue in the bug tracker of our local wireless community (
https://dev.opennet-initiative.de/ticket/187
- only in German). But this discussion may be a bit hard to read, as we were hunting down different potential causes of the problem. But sadly each of our theories dissolved without giving a hint for the root cause.