mt7621: kernel errors - rcu_sched detected stalls on CPUs/tasks - again
Username: Kristian Evensen
Origin: https://bugs.openwrt.org/index.php?do=details&task_id=1170
After the work done for issue
FS#804
, the rcu_sched error seemed to be gone. However, I am now starting to see it again. Usually, at least for me, the error happens when there is large amounts of traffic and I do something with the network. My most reliable way for reproducing the error is as follows:
-
Use iperf to flood a router with small packets. Other ways to stress the CPU also work, I for example triggered the error when I added very aggressive logging to the firewall.
-
While the router is being flooded, I restarted networking (I am logged in to the router via UART).
-
After a couple of network restarts, the error is trigger and the following is written to syslog at some interval:
[ 2251.870000] INFO: rcu_bh detected stalls on CPUs/tasks:
[ 2251.870000] 2-...: (1 GPs behind) idle=ae1/140000000000001/0 softirq=212487/217796 fqs=4380
[ 2251.870000] (detected by 1, t=6002 jiffies, g=-146, c=-147, q=4)
[ 2251.870000] Task dump for CPU 2:
[ 2251.870000] openvpn-mover.s R running 0 2598 1 0×08100004 [ 2251.870000] Stack : 8fa69998 800ebe38 00000000 8fa69998 57512e2b 000001fd 00000000 80035454
[ 2251.870000] 00000000 800edbd4 8fa69998 804b0000 00000000 00000000 00000004 00000000
[ 2251.870000] 00000000 8ea17850 8efc7ec0 800376d4 00000000 00000000 778b8930 00000012
[ 2251.870000] 00000000 004077cd 778d4000 00000000 778d55e8 778d6f7c 00000000 8002b280
[ 2251.870000] ffbffeff ffffffff 00617772 706d742f 00000000 00000000 00000001 800379dc
[ 2251.870000] ...
[ 2251.870000] Call Trace:
[ 2251.870000] [<8000bc88>] __schedule+0×574/0×758
I am also able to sometimes trigger the issue by simply issuing the reboot-command (while the CPU is stressed). I have not applied any traffic shaping to my interface, and I see the error both with kernel 4.4 and 4.9 (i.e., LEDE 17.01 and master). I don’t quite know how to progress in debugging this.