MQ controller DeltaV V14.3.1 loses communications with an entire 8-Wide carrier

Ran into a weird one this AM which I'd never seen until we upgraded our software and controllers to V14.3.1 and type MQ respectively. One of my controllers lost communications with an intermediate 8-Wide carrier housing 8 120 VAC Isolated input cards. Looked in diagnostics and all eight cards were showing as CFG Present but no card. My first instinct was to change out the carrier, which I did but which didn't rectify the problem. After checking a bunch of other stuff to no avail I decided to download the entire controller, which still didn't rectify the problem. I finally decided to pull the controller off the bus and put it back in. This worked but I'm thinking that the download should have taken care of the problem, or conversely, that it should never have occurred. Has anyone else run into this?

20 Replies

  • In reply to Andre Dicaire:

    Hey Andre. This particular controller cabinet is actually in a very clean environment. We did have the issue occur again about a week and a half after the original failure. The same bank of DI cards were affected and once again, a controller reboot/reseat cured the problem. I've changed out the affected carrier with a new one as well as the adjacent carrier which plugs directly into that one. I have a spare 1-wide carrier and cable assembly, which I'll try next time the plant is down. I've given the cable connections a quick check but things appear to be OK on that front. I also can't rule out a "Wanky" controller, as this one is the newer hardware style MQ which replaced the MD controller (that had been there for years) back in October 2020 during our upgrade from V11.3 to V14.3.1. I'm still puzzled as to why only one carrier/bank of cards is affected.
  • In reply to harnettw:

    If you have not, please open a call with the GSC. They provide support 24/7. This forum does not.

    This issue is clearly presenting as a bad Bank Select Line corresponding to this carrier. There are 8 bank select lines that run from the controller down through all carriers, extenders and cables. An issue anywhere along the bus will present only on the corresponding Bank or carrier. Similarly with the Slot select lines. When the controller selects a Bank and a Slot, only one card sees both go active and that one card then communicates.

    Which carrier is affected? One action is to disconnect the right extender from the affected carrier. If the problem persists, we know the issue is not caused by downstream components. If this is the 2nd or 3rd of 8 carriers, that will narrow the scope of possible causes.

    Since the select lines form a parallel bus, a fault anywhere on any select line will affect either all cards on a carrier or all cards in a specific slot on all carriers. It could be an issue with one of the controllers, a 2-wide carrier, any carrier or any extender cable/1-wide. It could be a fault in one of the 8 cards on that carrier (all other cards in the system are not connected to the affected Select line).

    You've stated that reseating the controller resolves the issue. That would tell me the issue is likely with the controller or its 2 wide carrier. By mechanically interacting with the hardware, reseating the controller might be stressing a solder connection back into contact that eventually breaks again. This could be in the carrier or the controller. Or the 2-wide to 8-wide connectors are not fully seated.

    If this is caused by one of the controllers, and the controller is shorting the select line, it could be either the standby or the active. If the controller has an issue that opens the select line, it will be the only one to have the issue and it will only occur when that controller is active. Note that the standby would report an issue when it fails to poll these cards during its diagnostic pass.

    I would focus on the controllers and 2-wide carriers next. The GSC should be involved going forward.

    Andre Dicaire

  • In reply to Andre Dicaire:

    Andre - We have experienced this issue on both S & M series in the last couple of years without ever finding a definitive root cause. Are there any specific system events that may help to predict future loss of communication issues. I.E. REDIO error messages.
  • In reply to Andre Dicaire:

    Thanks Andre. I've been at site here for two weeks and haven't seen the issue reoccur. I have a spare 2-Wide carrier here in my office and a spare controller in stock, which I'll pull and keep here in the office in case we run across it again. The carrier is the fourth of six carriers and the controller is a simplex configuration. The carriers are in close proximity as they are all in the same cabinet. I also have a connecting cable in the warehouse that I'll keep here in the office as well. The issue has occurred twice and both times it has affected the same carrier. I do remember installing that controller last October and I had some difficulty getting the screw in. It seemed like it had stripped so I took one from a spare and managed to tighten it in. I'll elevate it to the GSC at the next occurence, if there is one.
  • In reply to gxmontes:

    gxmontes: First, I hope you logged a call with the GSC. Sometimes issues can be difficult to resolve to a root cause, but when all such incidences are logged, the possibility of finding a greater pattern or commonality can lead to identification of a common issue.

    As for predicting future loss, I don't believe so. The issue we are discussing here is a fault on a Select line, which would affect one bank of IO or one slot on all banks. The controller selects the Bank and Slot lines to progressively poll each card. And on a redundant node, the Standby is allowed to select one card per IO scan so that it can verify that it can communicate with each card.

    Understanding how the select lines work is important to formulating a troubleshooting strategy. Here is my take:
    1. Did the issue occur immediately following a physical change to the Bus? Added a carrier, reseated or replaced a card/Controller. Remove the affected component and isolate.
    2. Check all connectors on carriers, extenders and cables to make sure there are no loose connections and that carriers have not slid apart on DIN Rails. Ensure DIN Rail stops are secure to prevent this.
    3. Determine if the issue is a Slot Select Line or Bank Select Line issue, i.e. all affected cards are in the same slot of their carrier, or are on the same Carrier. Technically, any one of the affected cards shares either the Slot or the Bank Select lines. Removing each affected card can quickly determine if a card fault is shorting a line to Ground. A card cannot cause the line to open circuit. For Bank issues, disconnected any downstream carriers can quickly confirm or eliminate them from the problem.
    4. The controllers connect and drive all the select lines. If a controller is unable to set one the of the lines due to some internal fault, this would show up in diagnostics, but not for both. The Standby would show the card as present and switching over would resolve the issue, requiring the affected controller be replaced. Reseating the controller might cause the issue to be resolved, though depending on the root cause, could return.
    Remember the 3 R's: Reboot, Reseat, Replace.
    5. The "Tin Whiskers" Issue. Visual inspection of the Extenders and install the recommended connector caps as discussed in a KBA. You likely cannot see these tendrils with the naked eye, but you may want to have a connector inspected under microscope to confirm if the phenomenon is occurring on your installed hardware. If so, work with the GSC to get the installation updated to latest recommendations.

    The LocalBus architecture is very robust, with no moving parts and solid connection points. Left alone, it is extremely unlikely for a select line fault to occur due to hardware failure. Most failure modes that exist would result in open circuits that would would only affect one card, that is if a card failure on its select lines occurs, it is not likely a fault to Ground but an open circuit affecting on that card. The Controller is the only active component that connects to all 16 select lines. If a line is faulted to ground, both controllers would be affected, but would clear up when the affected controller is removed/replaced. If it is an open fault in the controller the standby would still show cards as present and switchover would resolve.

    There is no way to predict this kind of failure as it is a hardware fault that is detected by a loss of communication. When a card is selected, the controller waits for that card to communication on the bus. If no card responds, because the card fails to see its slot and bank lines selected, the card cannot tell the controller what its problem is. Each Card has robust diagnostics to prevent a card from ever connecting to the communication bus out of turn, but there is nothing a card can do when it is never "selected".

    So again, report such issues to the GSC. What cards are affected, what carrier, what versions of hardware, software and what the physical layout of the IO bus. And what changed and when before the issue occurred. If there is a pattern, it will only be seen where all occurrences are reported.

    Andre Dicaire