News:

  • September 12, 2025, 07:21:24 PM

Login with username, password and session length

Author Topic: BRX Hardware watchdog timeout  (Read 559 times)

Kristjan

  • Sr. Member
  • ****
  • Posts: 67
    • Idnadartaekni ehf
BRX Hardware watchdog timeout
« on: August 27, 2025, 09:03:43 AM »
Seen several reports about it on the forum but didn't find a solution...

I have a few BRX in the field (different sites) that periodically reboot following hardware watchdog timeout. One of them is rebooting every 90 days and another one reboots every 50 days. Most of the time it doesn't hurt the respective operation since the PLCs comes back in RUN mode. But I have also seen PLCs that reboot and go to PROGRAM mode. I actually haven't noticed these timeouts until recently but they go way back in time.

The BRX I am working on now has OS 2.9.4. Reboots every 50 days. Attached Event Log and some relevant data.

I thought maybe it had something to do with Modbus communication:
It is throwing a $DriverError (ST143) due to errors on a Modbus line on POM module BX-P-SER (MRX polling intercal 10 sec). Comparing the timeouts with data logs I am unable to link the driver error to the timeout. I have another Modbus line on the native RS485 port with all communication OK (MRX polling interval 5 sec). I am also polling three external machines on Modbus TCP - all comm OK (MRX polling interval 5 sec).

The BRX is communicating on OPC with a SCADA system on a virtual server. Could this communication cause timeout issues?

Any tips on how to pinpoint the cause of the timeouts?

BobO

  • Host Moderator
  • Hero Member
  • *****
  • Posts: 6116
  • Yes Pinky, Do-more will control the world!
Re: BRX Hardware watchdog timeout
« Reply #1 on: August 27, 2025, 02:00:32 PM »
Ohhh...that's interesting. 49.6 days is the overflow of a millisecond timer. Your timestamps are about an hour longer than that, but it seems too regular and too close to 49.6 to not be related to the rollover.

The program mode thing is due to the reboot protection feature. After the PLC has rebooted 10 times, it drops to program mode. If there was a driver failure or programming error that caused the PLC to reboot constantly, staying in program mode gives you a chance to fix it remotely. You can reset/prevent it by clearing $WatchdogReboots from code.

Modbus comms do use a millisecond timer. It absolutely could be related.
"It has recently come to our attention that users spend 95% of their time using 5% of the available features. That might be relevant." -BobO

Kristjan

  • Sr. Member
  • ****
  • Posts: 67
    • Idnadartaekni ehf
Re: BRX Hardware watchdog timeout
« Reply #2 on: August 28, 2025, 06:47:58 AM »
This is very helpful. Now I have a handle on the situation with rebooting in PROGRAM mode. I'll reset DST385 before it reaches 10.

About the Modbus comms timer: The problematic Modbus line is returning data for multiple days and then it drops out for multiple days, this is not periodic. What is the Modbus comms timer doing? What would cause it to overflow? Can I view the contents of the timer?

About the hardware watchdog: How do I figure out what causes the CPU to not "pet" the watchdog?

BobO

  • Host Moderator
  • Hero Member
  • *****
  • Posts: 6116
  • Yes Pinky, Do-more will control the world!
Re: BRX Hardware watchdog timeout
« Reply #3 on: August 28, 2025, 10:35:58 AM »
This is very helpful. Now I have a handle on the situation with rebooting in PROGRAM mode. I'll reset DST385 before it reaches 10.

About the Modbus comms timer: The problematic Modbus line is returning data for multiple days and then it drops out for multiple days, this is not periodic. What is the Modbus comms timer doing? What would cause it to overflow? Can I view the contents of the timer?

About the hardware watchdog: How do I figure out what causes the CPU to not "pet" the watchdog?

I have no idea what's failing...yet. Working now to build the test cases to hopefully to catch it. The interesting point (to me) is the failure interval perfectly aligns with the system millisecond timer. That isn't a Modbus thing, that's system wide, but I suspect the lockup is due to some communication code (possibly the TCP/IP stack). This ~50 day clue is a big help for identifying the issue.

The CPU doesn't pet the watchdog because it's stuck somewhere in a loop that shouldn't take much time, but is. There is nothing you can do about it. We have to find and fix the bug.

As for monitoring the system timer, yes you can. It's the value that's reported by TICKms() in the MATH box. The lockup appears to be happening about an hour after it rolls over from 0xFFFFFFFF to zero. The rollover interval is 49.6 days, but your failure is happening at about 49.8 days.

If you are ok with it, it might be good to get your program. You can send it to support at hosteng.com.
"It has recently come to our attention that users spend 95% of their time using 5% of the available features. That might be relevant." -BobO

Kristjan

  • Sr. Member
  • ****
  • Posts: 67
    • Idnadartaekni ehf
Re: BRX Hardware watchdog timeout
« Reply #4 on: August 28, 2025, 11:46:47 AM »
Thank you. Program sent to support.