Redundancy Alarm on SIS controllersRed

I have a redundancy alarm "Redundancy: Not Communicating - Secondary Network" occurring multiple times over the course of some minutes in infrequent and inconsistent times on two separate SIS SZ controllers. Each time the alarm will clear 10 seconds later. I have changed out the network port to no avail. Diagnostic monitor indicates the redundancy is good for each controller. Is there anything else to be looked at? I suspect this is being caused by heavy network traffic. Is it possible to put a delay on the controller alarm such that it will only alarm under a sustained alarm condition? Thank you for any assistance.

6 Replies

  • It is not easy to give a good answer without knowing more about your SIS and DCS network layout, so please forgive me if some of my assumptions are not correct.

    1. I am not aware that you can enter or setup a delay on hardware alarms, especially for any SIS hardware. You can only disable the hardware alarms for the SIS controllers (but I do not think you want to do that).

    2. I am assuming that you have looked in your event log for any other network errors around the same time that you get the "Redundancy" error and there were no other alarm (like for the primary not communicating on the secondary network). In that case, is it possible that there is a problem with the SZ controller carrier? Or with the redundant controllers?
    Maybe try changing both the redundant and primary ethernet ports.

    There is one more possibility:
    Depending on your secondary network topology, there might not be anything wrong with the hardware of the controllers, but instead with a component that is before the controller. Maybe a switch port that is the problem or the network port of a previous controller?

    I have to admit, I am grasping at straws, since I have no clue how your network layout looks like.
  • If you go to Hardware Alarms under the controller, it's possible to Enable/Disable and Shelve the alarm, but you can't put a delay.
    Based on your description, it's definitely related to the network, not the controller redundancy. When properly setup, high traffic would not be an issue, but without knowing the architecture, which switches have been used, it's hard to tell, you can do normal troubleshooting like replacing cables, re-clamping connectors, check the F.O. fusion if that's the case, if nothing of that works, you should open a ticket with GSC.
  • Having the same 10seconds-issue at a redundant SZ-CTRL, I would like to know if you have found any solution or at least the origin of the failure.
    brg Stefan

    Stefan Müllner
    Senior Plant Engineer - Automation & Systems

    METADYNEA AUSTRIA GmbH

  • In reply to Stefan Muellner:

    I'd agree with Tadeu that the issue relates to the secondary network.

    Normally, DeltaV traffic uses the Primary network. If all nodes communicating with the SZ controller have a healthy primary network connection to it, then all traffic will flow on the Primary. The secondary will send and receive diagnostic traffic that is used to indicate the health of this network. So network load would not be an issue here,

    What does "Redundancy: Not Communicating - Secondary Network" mean? At face value, it seems to say that the controller is receiving no network traffic from the secondary network.

    If we were to disconnect the Secondary network, do we see the same message, except that it stays on until the network cable is connected. Note that the RJ45 jack LED's will stop flashing with the cable disconnected. Do you observe this intermittently? I don't have access to my system hardware to test this for you. Note that when you remove the secondary cable, you should be able to see a Bad connection show up on all other controllers/nodes that communicate with the SZ. Since these nodes have connection to other nodes, and they are still communicating to the SZ over the Primary, they don't have any fault themselves to report, but you should see the number of good/bad connections change and the connection list on those controllers will show the SZ with a bad secondary network.

    If we can reproduce the network diagnostic by disconnecting the network, then we have narrowed the source of the issue to the switch port, the cable, the Network communication module, but we have not eliminated the Carrier and possible they Active SZ itself. You replaced the communication module with another and the problem remains, so we have eliminated the module itself.

    1. Check the Switch to see if it has any indication of issue. Unlock the ports and if possible, connect the network cable to the SZ secondary on a different port. If possible, connect another node to this port and see if the issue appears on the other node.

    2. If you have a cable tester, check the cable. Or if you have another cable you could use temporarily, check to see if the issue is resolved with a new cable.

    3. If we can, perform a switchover of the controller, and the issue goes away, then it could be the current Active SZ controller's network port. I would switch back to the original and if the issue comes back, we have confirmed the controller. Note that by switching over, we reset the controller as it reboots and this might also resolve its issue, though I would doubt this very much.
    If the issue persists, it is not the controller itself. So that would be an action to consider.

    4. You've already replaced the Communication module on the SZ. It was not that.

    5. At this point, we will have eliminated all components used in communication between the controller and the broader Network, except for the SZ carrier itself. The Communication module connects to the carrier to connect both the Active and Standby secondary ports to the communication module. Though highly unlikely, if this part of the path is compromised, we would need to take an outage to replace the carrier. So I leave this for last. If nothing else resolves it, we end up here.

    When troubleshooting the network connection, we need to trouble shoot back to a point where the SZ path is sharing a switch with other devices that are not reporting a loss of network connection. The issue could be happening anywhere along the path, such as a media converter or intermediary switch or its uplink. evaluate the topology to isolate the potential points where the SZ is the only device that relies on that path. Then eliminate that as the possible source of the issue.

    Suggest you enter a call with the GSC as well. I would ask them to provide details on exactly what the diagnostic message is telling us, and what the possible causes would be, taking into context messaging from all nodes that communicate with the SZ. They will likely want to review the Event Chronicle records for the system.

    Andre Dicaire

  • In reply to Andre Dicaire:

    We did replicate the alarm exactly by disconnecting the secondary cable. The cable did check out good.

    We did also find that 90% of the time the SIS alarms came in were after a download on one of the loaded NON-SIS controllers. There are two separate SIS controllers connected to separate switches that will both generate this alarm simultaneously. The suspicion is the loading on the NON-SIS controllers as they are sometimes at a 2 on the performance index which is already pretty loaded. We are looking at ways to alleviate this.
  • In reply to Andre Dicaire:

    Andre, thank you very much for your detailed explanation!
    The bulk of the points were checked, a few components exchanged already - without any improvements. I will complete the "list" and open a call with GSC.

    brg Stefan

    Stefan Müllner
    Senior Plant Engineer - Automation & Systems

    METADYNEA AUSTRIA GmbH