MT Router rebooting by watchdog timerSo I have an RB750UP serving up a wifi hotspot with 5 UBNT APs, which has between 30 to 60 users online at any given point.
Lately I've been seeing this error.
jan/01/1970 18:00:10 system,error,critical router was rebooted without proper shutdown by watchdog timer
Last uptime was around 3 days, and now I just saw it again. It is hard to diagnose the times, as the router doesn't get time from the NTP server yet when it reports the error to the log.
Should watchdog timer be enabled by default? Mine is on, but there is no address to watch. That just means if the watchdog thinks the router locks up it will reboot it?
I'm almost assuming this router can't handle the job and I need to upgrade it, but just curious what as to what may be causing this.
(Edit: Not sure how this ended up as a sticky)
Could be a kernal panic.. What Ver are you running?
If you are using it then have it generate and send out an output file to your E-mail so you can look to see whats going on.
ComTrain Certified Tower Climber.
Wireless and IT consultant.
Proficient in Mikrotik
Thats what I assumed, maybe I accidently turned it on.
Just logged in now, and yet again, uptime is around 34 minutes, had another reboot by system watchdog.
Turned it off, we shall see how long it lasts now. Before this I was almost 2 months uptime before I had to power cycle it for other reasons.
reply to TheHox
watchdog-timer (yes | no; Default: yes) Whether to reboot if system is unresponsive for a minute.
I also turn on graphing to see what cpu, memory and other things are doinig. I also monitor bandwidth usage per port to also give me status.
| Yea I have those enabled.|
Watching the CPU usage real time it spikes more than the graphing shows. I was downloading some large files yesterday which is the large spikes in bandwidth and CPU, but beside that, it doesn't really go that high.
The watchdog timer on RB products are all hardware watchdog timers, the CPU can be hard locked and the watchdog timer will still reboot the board. The RB software has to keep poking at the watchdog timer located in a PLD before expiry or it will reboot the CPU or the PLD is doing the querying, not really sure.
So, what could be the cause?
Hard to diagnose. Check the settings like SNMP on routerboard OS seems quite flakey and can drive the CPU to 100% for no good reason. off site logging can help. Are you using the "UP" part of the board to power the ubnt access points or using the included injector?
Turning it off will likely stop your problem, but one of the things I love about routerboards is the hardware watchdog and has saved me a few late night power cycles.
You can use any public NTP server to set the time on an RB.
OptionsDSL Wireless Internet
reply to TheHox
I have a dozen of these.
Verify that your Ethernet cables are good. I had one with a short and when I tried to POE over it the 750 would reboot.
Check the Routerboot is current for the Version of OS you're running. Version 5+ Click System > Routerboard. Check the Current Firmware and Upgrade Firmware are the same. If not upgrade and reboot. Version Routerboard>Settings
Check the Power supply is strong enough if you are using the POE output. Maybe think about swapping the Power Supply with another one.
And last option which has actually worked for me a few times. Reset to defaults and program it again.
NOTE that If you turn off the watchdog timer, whatever it is that's causing it to halt and reboot will probably not stop happening, and you will have disabled the ability of the hardware to reboot itself.... which means you will have to power cycle it manually.
reply to TheHox
Thanks for the info.
I turned it off for the hell of it, it ran about 3 days just fine till last night. I RDP'd into the network and was working on some other things when I suddenly got disconnected. The router froze up, and low and behold I had watchdog disabled. So I went for a drive and reset it, turned it back on.
I did upgrade the routerboard firmware. I'm guessing its something with the programming. I just picked up an RB1200 to put in its place, I'll re program that from scratch when I install that one this weekend.
I have another question now but I'll post a separate topic.