Host Engineering Forum
General Category => ECOMs and ECOM100s => Topic started by: rswolff on May 07, 2011, 04:36:16 PM
-
I have a network with 10 D0-06's and H0-Ecom100's. Also have 4 Cmore touchscreens on network. One Wonderware station. Nothing else. No printers. No other pc's. Cmore scans set to 250ms or more. 128bytes of data being read by units 7-10 from units 1-6 every 2-3seconds.
Randomly the Ecoms will stop communicating to the plc's. Pretty much isolated to units 1-6. Never effects any of the Cmores or the Wonderware station. As soon as the Ecom stops communicating to its processor we get communication alarms at the other stations. On some occasions erroneous data was also passed before the communication stopped.
Do not normally have access to the processors as they are located in enclosures mounted 15-20ft high on a wall (why we installed an ethernet network). So I need a lift to open the enclosure doors. Do have access to the circuit breakers. While they are not communicating, ecom can still be pinged so its not dead and the network is still functional.
Once the communication stops only thing I can do is cycle power to restore communication.
I need to rectify or I'll need to rip out h/w and install something that works correctly.
-
Sorry you're having trouble. Some questions/suggestions:
(1) Are the Cmores communicating to stations 1-6?
(2) Are stations 7-10 using peer-to-peer configs or broadcasting?
(3) Are you using Ethernet switches or are there any wireless switches/routers?
(4) Is the WW station talking to stations 1-6? And are they using Modbus TCP?
(5) Do the H0-ECOM100s have latest firmware?
(6) Do you have RXs in stations 1-6?
(7) You said erroneous data was received... have you verified that with an Ethernet sniffer (e.g. Wireshark)?
(8 ) Please look at the FAQs on our website and there is one written about noise (ECOM FAQ0045; http://www.hosteng.com/FAQFiles/ECOM.htm#FAQ0045) ... make sure you follow the suggestions there.
Hope this helps!
-
I have used H0/H2-ECOMS on dozens of different jobs. I have never had an issue where they lock up until being power cycled. I tend to think that it is eiter a noise issue,
or a power supply issue.
ECOMS are definatly reliable when used properly.
It would be better if we could power cycle things 1 unit at a time, and narrow it down
to a single unit.
-
there are no Cmores talking to units 1-6, however there is a single OP420 on a serial port. One for each unit.
Units 7-10 are peer to peer. The ecoms are setup to use the P-to-P addressing so as not to require a broadcast packet.
Everything is wired with the exception of unit #8 which is connected to a switch using a wireless radio setup. There is a single switch located with each ecom.
The Wonderware station normally does not communicate directly to any of the stations except #7 which is located in the same enclosure and tied to the same switch. We are using KEPServer OPC and not Modbus. Unit #7 polls units 1-6 directly and the WW applications reads from it. Only for a few settings does the WW station send a direct message to any of the units. This is usually not done during normal operation.
Nope. I checked and I'm running everything from .49 to .221 firmware. I just downloaded .229 from your site and should be updating in next few days.
Yes. Stations 1-6 are using ECRX reads from each other. So Station#1 reads 2-bytes from stations 2-6, station#2 reads 2-bytes from stations #1,3-6, etc.
I used Wireshark but at that time I was not having any problems. Erroneous data appears to be caused at the point of the communication fault, but its hard to determine. As far as I can tell, I sometimes have wrong data from the last transmission before the unit stopped talking. I can implement some buffering but I'd prefer not to have too. I would have expected the module to buffer the data and not pass it along if the communication packet did not complete correctly.
-
after reading the FAQ's I have the following. Cabling is high quality (whatever that actually means). Switches are industrial quality. Cabling is as far as can be routed without being in separate enclosures. There are no motors, fans, pumps, or cycling (on/off) high power equipment. IT IS A MANUFACTURING PLANT so there is other equipment in use in the plant on different power feeds. None of that wiring or equipment is anywhere near the ecoms. The highest voltage being seen is for some beacons and horns at 120vac. These are not active in normal operation
-
I have also used Ecoms for other projects without much problem. However there was either a small amount or no Peer-To-Peer which seems to be the problem. The AutomationDirect plc's are also problematic as they will also lock themselves out when they have some sort of a communication problem. No workaround for this other than cycling power. Also would cause me to really really think twice before I used either in another networked project. Nice enough for a small machine, not robust enough for any real type of data transfer. Be nice if they told me ahead of these problems. :o
-
Be nice if they told me ahead of these problems. :o
You seem convinced that we are aware of problems that we choose not to fix, and should be disclosing some kind of technology dirty laundry. Come on, do you honestly believe that? Seriously? All we have is our good name and customer perceptions, and we work very hard to protect them both. I cannot control Koyo, but I certainly can and do control Host. I don't know know what your experience has been with other vendors, but leaving customers hanging has not been our way...ever..in our 19 years in the business.
As far as I am aware, and I did ask Greg about this, we have not had any direct dealing with you...which means...you have never reported a problem directly to us and made any effort to work through it. Please give us a chance before passing judgement, you might be pleasantly surprised. Unlike vendors like AB, we actually listen and make a concerted effort to fix the issue.
It always surprising to me when a customer reports a bug that upon further study was determined to have been there since the initial release, but it has happened on many occasions...unreported for a decade or more. What that means is that sometimes certain pathways in the code don't get hit because nobody else has done it that way before. It happens. An interesting point in your message was the use of the serial port in addition to the Ethernet. That may be significant.
The firmware engineer that does the ECOMs is as fine a developer as I know. If it is us, we'll fix it. If it is Koyo, we'll still try to work around it...we've done it dozens of times before. If we can't fix it here, we'll lean on ADC and Koyo on your behalf to get a remedy. We will get it fixed. The first thing we need to do is to figure out how to duplicate the problem. Greg will be on it with a vengeance, won't you Greg? ;)
-
not my intention at all. However it should be noted that I am also communication with tech support at AD, and they have already indicated a problem on their side, not necessarily related to the ECOM itself. This is a problem (I guess it depends on your perspective) they knew about but its either non documented or hidden away on page 692 of some manual.
I'm simply implying that it would be reasonable to indicate some of the short comings so that System Designers don't get hit in the face with the discrepancies. The ECOM to plc link appears to be prone to deciding something is amiss and then determining it should never again communicate or try to. Tough on me when I now have 10 units in the field that appeared to work ok, but now I find can't communicate for any length of time. Causing me not just tons of problems, but a definite monetary impact to boot. I could have perhaps made a more intelligent decision of what h/w to implement if the information was available. Not Host Engineering's fault. I expect it should have come from AD.
Had I known at the time that Host Engineering had a forum and had looked, I probably would have changed my h/w.
-
I cannot comment on what ADC Tech Support does or does not know, because I simply do not know. I do know that there is nothing in the ECOM that causes it to decide to stop talking to the CPU as a normal thing. If we stop submitting requests, we are broken and need to be fixed. If the CPU stops processing requests, it is broken and needs to be fixed. It isn't complicated.
If ADC techs are suggesting that this is normal and expected, they are wrong. Under no circumstance should the PLC or module stop talking. Period.
If this can be duplicated, it can be fixed. If I have to push it through the executive channels to get Koyo and ADC on board, I am willing and able to do so.
-
no, the ADC tech's are not implying that directly. There is a problem if you're logged into the processor and under some conditions if you don't log out but disconnect, the processor will no longer communicate. I have no idea if this is part of the problem with the Ecoms. I do know that I have a bunch of them that will randomly stop communicating to the processor, but are still on the network. As there are no status bits, words, or anything else to indicate what the problem was, is, or might be, its rather difficult to diagnose or correct. I can understand faulted communications packets, incomplete transfers, etc. I can't fathom any system in this kind of environment that would be developed to simply have no timeouts, resets, etc. so the system can recover, or attempt to. This apparently is a one shot deal. Its either ok, or cycle power. Not exactly what I can tell my customer since its not a once in a year thing but happens at least every day or two. And I can test every switch, cable and connector, and place filters on every incoming power line. Still doesn't tell me why the Ecoms can't communicate.
-
Ok, let's walk through this and see if we can shine some light...
There is a problem if you're logged into the processor and under some conditions if you don't log out but disconnect, the processor will no longer communicate. I have no idea if this is part of the problem with the Ecoms.
What do you mean "logged into" and "don't log out but disconnect"? The protocols that the ECOM uses with a DirectLogic PLC are generally connectionless. There is no login or logout, unless you are talking about entering a password. Modbus/TCP has TCP connections, which are actual end-to-end connections, and if you drop the connection without going through the proper TCP disconnect sequence, the connection will stay open until it times out...and TCP timeouts can be fairly long. They do timeout eventually though, and we do have inactivity timeouts where eventually we dump connections that aren't being used so they can be reused. But please help me understand.
I do know that I have a bunch of them that will randomly stop communicating to the processor, but are still on the network. As there are no status bits, words, or anything else to indicate what the problem was, is, or might be, its rather difficult to diagnose or correct.
We have on many occasions built custom firmware with diagnostic code built in. We can also learn a great deal from Wireshark traces. If you are serious about solving this, then please work with us.
I can understand faulted communications packets, incomplete transfers, etc. I can't fathom any system in this kind of environment that would be developed to simply have no timeouts, resets, etc. so the system can recover, or attempt to. This apparently is a one shot deal. Its either ok, or cycle power.
Again, you seem to be assuming...wrongly...that the system is built without fail-safes. I assure you that is not true in the case of the ECOM. If the PLC itself is going on walkabout...and it may be...we need to address that with Koyo and ADC, and we will. We first have to duplicate the problem, document the issue, and then raise Cain. We will need you to work with us on that though.
Not exactly what I can tell my customer since its not a once in a year thing but happens at least every day or two. And I can test every switch, cable and connector, and place filters on every incoming power line. Still doesn't tell me why the Ecoms can't communicate.
Not trying to pick or maintain a fight, but you seem pretty stuck about the fact that the ECOM isn't communicating, rather than focusing on helping us change that. It may be the ECOM, it may be the PLC, and both can be fixed. There may also be a workaround, once we understand what is happening.
You have a choice: 1) Rip it all out and return it, or 2) Help us help you. I would prefer #2, but that has to start with you moving on from 'I can't see why it doesn't work' to 'What do I need to do to help you help me?' I am sorry if ADC Tech Support was not able to help you. We can, but haven't heard a peep about this until yesterday. Please give me the courtesy of at least a few business days to help you...when I actually have some staff in the building.
-
i'm simply asking for some help in troubleshooting and correcting the problem. If I've appeared to be doing anything else I apologize. There are a lot of people looking over my shoulder requesting an answer to the problem.
-
There is a problem if you're logged into the processor and under some conditions if you don't log out but disconnect, the processor will no longer communicate.
rswolff, I believe you are referring to the "Unable to lock processor" error. This particular issue is not related to your H0-ECOM100 issue. We don't know about that one, but since it is unrelated...
Based on your installation description, it doesn't sound like a noise issue. But, regarding your communication matrix, I have done some research on your problem (based on the database at Automation Direct) and your posts here and I think I gather the following summary. Please tell me if this is correct as there are some things I'm not clear about that I'll indicate with (?).
SUMMARY:
- PLCs 1-6 read 2 bytes from each other every (?) seconds using peer-to-peer ECRXs.
- PLCs 7-10 read 128 bytes from PLCs 1-6 every 2-3 seconds using peer-to-peer ECRXs.
- WonderWare station reads (?) bytes from PLC7 every (?) seconds using KepDIRECT DL-driver.
- WonderWare station reads (?) bytes from PLCs 1-6 only incidentally using KepDIRECT DL-driver.
- Cmores 1-4 read (?) bytes from PLCs (?) every (?) seconds using native DL-driver.
- OP420s 1-6 read (?) bytes from PLCs 1-6 respectively every (?) seconds using PLC serial port (K-seq or DirectNET).
QUESTIONS:
(1) Can you fill out the (?) in the above summary and make any corrections?
(2) Can you tell me what the following H0-ECOM100 Advanced Settings are set to for each of the 10 H0-ECOM100s?
- RX/WX Settings: ACK Timeout
- RX/WX Settings: Resp. Timeout
- RX/WX Settings: Retries
(3) Does this timeout issue happen just at random (over a period of time) or exclusively when you do, say, an online edit?
P.S. I tried calling your phone number and just got a FAX. I tried to speak FAX, but I couldn't understand it. ;D
-
I think a simple block diagram of the components showing the data flow would be helpful to clarify the layout.
-
[crickets]
Maybe the plant blew up.
-
not sure why you got the fax? perhaps someone turned off the message unit and turned the fax to answer mode.
you can reach me easier on my cell at <deleted>.
I concur....I don't believe its a noise issue either....I'm pretty isolated from the rest of the plant....and it doesn't appear that I have nodes dropping in and out, or any lost or fragmented packets.
I can almost always force the Ecom to stop talking to the processor by simply making an online edit over the network. Once I accept, the change will always go through, but I'll get an error from the programming s/w (I neglected to write it down and won't be able to check until Monday) and thats all she wrote. The plc actually goes back to run but will no longer communicate to the Ecom. The Ecom is still network ready (i.e. I can still ping and change its settings). I can't easily get to the enclosures with the h/w. The six in question are mounted 15-20ft up a wall, and there are usually skids or equipment blocking access.
It almost appears that during the download process the Ecom is still accepting the Read requests from the other nodes. As soon as the processor goes back online either the Ecom or the processor do something that causes the problem. I believe its related to the stacked messages. Would this cause some sort of a timeout or some other problem if the processor was paused for 3-4 seconds (possibly longer) during a problem load?
I'll pull out the h/w settings and post tomorrow.
-
i'll see if I can post a network layout diagram.....I have one in Autocad
not sure why Darth Ladder believes the plant blew up....it was simply disassembled atom by atom, converted to dark matter and then 'saddled' to an energy beam and transported several hundred light years from its original location where its was converted back to its original matter form and reassembled. Since the new location was identical to the original, none of the occupants in the plant is any the wiser. But since space apparently is expanding we should be able to visit them shortly. :o
-
Don't forget that this will now put you in a new time zone and remember to adjust the PLC clocks.
-
Don't forget that this will now put you in a new time zone and remember to adjust the PLC clocks.
No. He just needs to move to MX at it's earliest availability and use the NETTIME instruction and Time Sync functions to manage it. Of course that means we will need to go ahead and add IPv6 support, since I doubt that we will be able to manage his Intergalactic network with IPv4 addressing. Crap...that's gonna be another hit to the schedule. ;)
OTOH, he didn't seem too cheery about using Host products in the future, so I guess MX is off the table. Tis a shame. While Koyo ties my hands in many cases, I might actually be able to make the issues go away were this an MX problem.
-
One thought.
Also just for clarification, are you getting a CPU LED fault on the 06's? Just one or all?
If you are, then my guess (and it is only a guess) is that you just might have some noise getting into the communications either in the serial port or the ECOM and since you are doing a data write, its probably stomping on someplace in memory that is faulting out the CPU. Take another look at the shielding on the serial port and the network cables. If you are using Shielded Ethernet cable assemblies, look at the grounding of the switches. The ground on the ECOM is isolated (I think). However, if all the switches are grounded then the cables between switches would be grounded at both ends. It could also be a faulty switch. Just because its new, it doesn't necessarily mean that its beyond suspect. Also, what protocol are you using on the Serial port? I would disable all the ones that are not used.
-
Of course that means we will need to go ahead and add IPv6 support, since I doubt that we will be able to manage his Intergalactic network with IPv4 addressing.
That's a long wire. Transmission rate may have to be reduced so far that it reduces bandwidth so low that by the time you get the time stamp, it'll be out of date, plus it'll be totally invalid in format as we'll have moved from the Gregorian to the Obamian calendar.