News:

  • April 30, 2026, 03:09:38 AM

Login with username, password and session length

Author Topic: MQTTPUB Gets Stuck  (Read 14018 times)

Bolt

  • Hero Member
  • *****
  • Posts: 595
MQTTPUB Gets Stuck
« on: October 27, 2022, 01:42:41 AM »
I have a BRX 2.9.3, 2-3ms scan time, 8ms max, the program that has two different MQTTPUB instructions, to the same server, different topics. Every once in a while, about once a month or so, both will stop publishing. I think what's happening is they trigger at (near) the same time, and lock out. One instruction will JMP to the Error Stage, while the other will just sit there, not officially erroring out. Then both instructions will be unable to publish, the one being stuck, and the other keeps erroring. I then disable the stage with the stuck MQTTPUB, and the other immediately goes back to publishing queued data successfully. I then enable the offending stage, and it goes to firing away its queued data.

What's causing this, and what do I need to do to address this? Interlock the two instructions? Add a Timer to kill the offending Stage? Rewrite a bunch and get them to publish all through the same instruction? Other ideas?
« Last Edit: October 27, 2022, 01:49:40 AM by Bolt »

BobO

  • Host Moderator
  • Hero Member
  • *****
  • Posts: 6154
  • Yes Pinky, Do-more will control the world!
Re: MQTTPUB Gets Stuck
« Reply #1 on: October 27, 2022, 09:42:10 AM »
I have a BRX 2.9.3, 2-3ms scan time, 8ms max, the program that has two different MQTTPUB instructions, to the same server, different topics. Every once in a while, about once a month or so, both will stop publishing. I think what's happening is they trigger at (near) the same time, and lock out. One instruction will JMP to the Error Stage, while the other will just sit there, not officially erroring out. Then both instructions will be unable to publish, the one being stuck, and the other keeps erroring. I then disable the stage with the stuck MQTTPUB, and the other immediately goes back to publishing queued data successfully. I then enable the offending stage, and it goes to firing away its queued data.

What's causing this, and what do I need to do to address this? Interlock the two instructions? Add a Timer to kill the offending Stage? Rewrite a bunch and get them to publish all through the same instruction? Other ideas?

Best answer would be for us to fix it. The easiest way to do that would be a simple test program that fails readily, but barring that, turning on the full debug status dump of the client and capturing it in DMLogger while failing might give me a clue. If you set DST61 to 2, it should dump the max detail.

The entire MQTT system is a bit of an odd duck in that the driver aggregates all of the publish and subscribe states from instructions that can be constantly changing, and transparently manages that mess. That process was surprisingly nuanced and took a bit of effort. Stuck states definitely happened early in the development, so I'm not super surprised that there may be a gap somewhere, but this is the first issue I've heard in years.
"It has recently come to our attention that users spend 95% of their time using 5% of the available features. That might be relevant." -BobO

Bolt

  • Hero Member
  • *****
  • Posts: 595
Re: MQTTPUB Gets Stuck
« Reply #2 on: October 27, 2022, 10:04:43 PM »
I've never been able to get DmLogger data to dump via a VPN connection, is that correct? Or is there a work around for that?

If I get bored I may write a simple version of the program to MQTTPUB at an overlapping interval and see if I can get regular crashes. Will see.

BobO

  • Host Moderator
  • Hero Member
  • *****
  • Posts: 6154
  • Yes Pinky, Do-more will control the world!
Re: MQTTPUB Gets Stuck
« Reply #3 on: October 28, 2022, 09:54:36 AM »
I've never been able to get DmLogger data to dump via a VPN connection, is that correct? Or is there a work around for that?

If I get bored I may write a simple version of the program to MQTTPUB at an overlapping interval and see if I can get regular crashes. Will see.

It uses broadcasts, and VPN's generally don't pass them. The last 9 messages are in ERR/LastERR0-7 and/or MSG/LastMSG0-7, so a decent amount of info can be gleaned from looking at them.

But if you had a small program that easily duped it, obviously that would be better. I expect the fix is pretty simple once we know where the hole is. The fact that it can be jogged/restarted is good news and suggests a fairly superficial problem.
"It has recently come to our attention that users spend 95% of their time using 5% of the available features. That might be relevant." -BobO

Bolt

  • Hero Member
  • *****
  • Posts: 595
Re: MQTTPUB Gets Stuck
« Reply #4 on: February 09, 2023, 11:13:22 AM »
So I still experience this problem from time to time (every 3-4 weeks). Each time, I attempt to make some program changes to attempt to alleviate the problem, such as staggering the two MQTTPUB commands so they don't try to access the same host at the same time, a short time delay between to stay within data writing limits, sending an EMAIL when the buffer fills up, etc.
The problem is now as such
The (2) MQTTPUB instructions "hang up" for some reason. They set the Error bit, and the PLC keeps queuing the data, once queue is full, it puts it on SD card, etc. The bulk of the data is multiple sensor readings every minute.
The error EMAIL never sends.
I (or the customer) notice that the data is not being published to the cloud.
I connect to device via StrideLinx VPN, log into PLC.
The two MQTTPUB sequences are in "waiting" mode, waiting for their next scheduled retry, every 30 minutes.
The EMAIL sequence is sitting waiting on the next retry to come up, which has been trying in 30 minute intervals for days.
 -I have gotten into the habit of doing a DNSLOOKUP (8.8.8.8 and 208.67.220.220) and PINGing the IP address before sending. It has been showing DNSLOOKUP error since the start of the hang up

Now the odd thing is if I leave everything alone, the EMAIL will succeed on it's next attempt, and the MQTTPUB's start rolling as their next time comes up.

So, I'm lead to believe it's a DNS type issue? I'm not sure if it's the action of connecting to the VPN router that lets the DNS request through (or connecting to the PLC).
The VPN router is set to obtain both IP address and DNS servers automatically from customer's router.

Should I set the BRX's DNS at the VPN router's IP address? Both in the DNSLOOKUP instruction and the Ethernet setup?

I will add that the BRX was set to a subnet mask of 255.255.255.255, which I have now changed to 255.255.255.0, but I don't think that affects anything, as a test EMAIL will go through regardless of subnet mask and VPN connection status. Also, Modbus TCP Scanner seemed to have no issues connecting to clients with the overly restrictive mask.

franji1

  • Bit Weenie
  • Host Moderator
  • Hero Member
  • *****
  • Posts: 3806
    • Host Engineering
Re: MQTTPUB Gets Stuck
« Reply #5 on: February 09, 2023, 11:47:01 AM »
I am not a VPN guru, not even a VPN novice, but I would think that the VPN client needs to configure DNS responsibilities that take into account the VPN and its possible configuration settings.

Is the VPN passing through DNS requests?  Is this secure?  I would think this could be a HUGE security hole (the whole purpose of VPN???)

I would think that a VPN router provides DNS service itself, so the PLC's DNS server in that situation would be the VPN router?

Bolt

  • Hero Member
  • *****
  • Posts: 595
Re: MQTTPUB Gets Stuck
« Reply #6 on: February 09, 2023, 12:01:31 PM »
So the VPN router only serves two purposes here, to keep (my) controls equipment isolated from customer's network, and to allow me to connect to controls equipment remotely.
The VPN router is set to "allow internet access" for the LAN connected devices, so it's not really forwarding any requests through any specific tunnels, etc.

But, like I said, it works until it doesn't. There seems to be some sort of DNS hangup along the way. I'm assuming the MQTTPUB instruction has a DNS lookup function built into it? It's been a little while since I've looked at any MQTT traffic in DmLogger.

BobO

  • Host Moderator
  • Hero Member
  • *****
  • Posts: 6154
  • Yes Pinky, Do-more will control the world!
Re: MQTTPUB Gets Stuck
« Reply #7 on: February 09, 2023, 01:20:32 PM »
Some of this sounds a little confusing. You say it's hanging up, but also said it sets the Error bit. Can you please elaborate?

A subnet mask of 255.255.255.255 may or may not cause issues, but it's improper and untested. In practical terms it means that nothing is on the local network, including the gateway it uses to get off network. Not sure what that will do, but it's nothing good. That something worked at all could be masking bigger issues.
"It has recently come to our attention that users spend 95% of their time using 5% of the available features. That might be relevant." -BobO

Bolt

  • Hero Member
  • *****
  • Posts: 595
Re: MQTTPUB Gets Stuck
« Reply #8 on: February 09, 2023, 05:55:05 PM »
Ah, after re-reading my initial post from 3 months ago I can see where you are getting confused here.

By now with my re-worked logic, "hanging up" just means not going through. I don't believe it to be sticking the MQTTPUB instructions. They set the Error bit, a retry timestamp is scheduled, and they don't try again until then. But they will retry over and over again for days until I log in, and they start rolling through, EMAIL and all.

I thought a 255.255.255.255 mask forced all local traffic attempts through the router, not direct via the switch portion of the device. But it's a moot point, I don't know why it was even set to such, I guess it was erroneously setup at commissioning, and has since been set to 255.255.255.0

BobO

  • Host Moderator
  • Hero Member
  • *****
  • Posts: 6154
  • Yes Pinky, Do-more will control the world!
Re: MQTTPUB Gets Stuck
« Reply #9 on: February 09, 2023, 07:22:24 PM »
I thought a 255.255.255.255 mask forced all local traffic attempts through the router, not direct via the switch portion of the device. But it's a moot point, I don't know why it was even set to such, I guess it was erroneously setup at commissioning, and has since been set to 255.255.255.0

Well, that is the net effect (a pun!), but generally the router (gateway) is on the same subnet as a device, which can't happen with 255.255.255.255, so I don't know what bad comes from that.

With that said, I speak IP only from having written stuff that uses it, I am by no means an IT guy or expert. If that is a standard known thing, please disregard what I said.
"It has recently come to our attention that users spend 95% of their time using 5% of the available features. That might be relevant." -BobO

Bolt

  • Hero Member
  • *****
  • Posts: 595
Re: MQTTPUB Gets Stuck
« Reply #10 on: October 17, 2023, 03:23:34 PM »
So today, this device (and another one, at another site) stopped publishing the MQTT data. I logged into the BRX, and found it was failing the MQTTPUB instruction near instantly, with a lower word response of 27, which the help file does not mention. I checked a few things, and it would ping the MQTT server successfully, etc. Finally, I toggled the CPU into PROG mode, and then back to RUN, and it instantly started to publish the queued data.

The second site I can't yet login to the PLC, seems the site internet is slow. However, yet another PLC at that same site is not having issues publishing, so not sure what to thing of it all.

What does response code 27 mean? What else can I try?

BobO

  • Host Moderator
  • Hero Member
  • *****
  • Posts: 6154
  • Yes Pinky, Do-more will control the world!
Re: MQTTPUB Gets Stuck
« Reply #11 on: October 17, 2023, 03:53:47 PM »
That is an illegal operation system error. In the case of MQTT it means that the driver is in a state it shouldn't be, namely, it should be idle but isn't. My comment to this result says "This really shouldn't happen unless software is broken." Indeed.

Some background...

In the interest of increasing the utility of our our MQTT implementation, it allows for multiple instructions to add and remove subscriptions on the fly. The driver handles all of this as a background operation, the instructions just feed requests to the driver. Since it is managing a complex state that could involve many instructions coming and going, there is a bit of a dance. This failure is the result of the driver having some registered work, even though there is no instruction actively involved. With run mode updates potentially adding or removing instructions, tasks/programs/stages terminating instructions, as well as the normal enabling and disabling of instructions, the process is more nuanced than I'm happy with, but we didn't want to cripple the feature to make it easier to implement. We try to make a hard thing look easy, but you obviously exposed a hole in the state management.

I would love to know how to duplicate it. It wouldn't be difficult to fix if I were able to dupe it.

There is a bunch of diagnostics that can be turned on for MQTT, although I'm not sure it would help if it can't be duped quickly. And if it can, we'll just fix it.
"It has recently come to our attention that users spend 95% of their time using 5% of the available features. That might be relevant." -BobO

Bolt

  • Hero Member
  • *****
  • Posts: 595
Re: MQTTPUB Gets Stuck
« Reply #12 on: October 17, 2023, 04:23:30 PM »
A quick background on my setups, there are 2 MQTTPUB (one topic each) instructions in the code, and no MQTTSUB's. They are triggered when queue is not empty. One queues data every minute, while the other queues data every 10-15 minutes as the processes dictate. I have some logic to prevent them from firing on the same second, but only the initial fire. If the queues pile up with data, they fire at will, and all bets are off.

Both of the systems were running on a fairly fresh program restart when I updated some SMTP server info a week or 2 ago, so not much has changed on the MQTT instructions lately as far as run time edits goes.

I do not know how to easily replicate the occurrence.

BobO

  • Host Moderator
  • Hero Member
  • *****
  • Posts: 6154
  • Yes Pinky, Do-more will control the world!
Re: MQTTPUB Gets Stuck
« Reply #13 on: October 17, 2023, 05:12:22 PM »
A quick background on my setups, there are 2 MQTTPUB (one topic each) instructions in the code, and no MQTTSUB's. They are triggered when queue is not empty. One queues data every minute, while the other queues data every 10-15 minutes as the processes dictate. I have some logic to prevent them from firing on the same second, but only the initial fire. If the queues pile up with data, they fire at will, and all bets are off.

Both of the systems were running on a fairly fresh program restart when I updated some SMTP server info a week or 2 ago, so not much has changed on the MQTT instructions lately as far as run time edits goes.

I do not know how to easily replicate the occurrence.

So sparse and async, from 2 different MQTTPUBs? Keep alive is set to the default 30 seconds? That suggests that most of the time the connection will drop between events, but when the events line up, it could stay up. None of that should be a problem, but it does add wrinkles.

You might consider increasing the keep alive time to keep the session open. That would at least reduce the complexity a bit and might work around the issue. Sessions coming and going always adds a fun twist.

What broker are you talking to?
"It has recently come to our attention that users spend 95% of their time using 5% of the available features. That might be relevant." -BobO

Bolt

  • Hero Member
  • *****
  • Posts: 595
Re: MQTTPUB Gets Stuck
« Reply #14 on: April 29, 2026, 06:50:40 PM »
That is an illegal operation system error. In the case of MQTT it means that the driver is in a state it shouldn't be, namely, it should be idle but isn't. My comment to this result says "This really shouldn't happen unless software is broken." Indeed.

This happened again this week, (MQTTPUB's Extended Error Information's lower word is returning 27) on a completely different (newer) system. It's running V2.11.1, and it stopped sending data a few days ago. I can't reboot the device just yet, and I suspect that will fix it, but can't find anything else that will jog the instruction back to working.