M-Series I/O - wait for obsolescence, run to fail, or replace preemptively?

To all the users out there with M-series I/O from the last century, what has been your philosophy / choices regarding replacement? Since our system dates to a time before any I/O card redundancy was available, how concerned should we be about cards that have been under power for 18 years? We have had (knock on wood) scarcely any outright hardware failures, but as we plan it would be interesting to hear other's views on hardware longevity . . .

- Do you get early signs of trouble (random errors) or do cards just give it up completely and suddenly one day?

- Any tests to reveal "weak" components?

- Failure modes? Are there particular components (e.g. capacitors) on cards that are destined to be the "first to go"?

I recently cycled power on a few dozen "Series 2" H1 cards as a result of flashing them to SW revision 4.9; since then (about a year ago) we've had a couple that went offline - restored by reseating the card.

8 Replies

  • This is a very important topic for lifecycle planning. Although DeltaV IO and controller products are built for a 20 year operational life, the impact of environment , especially temperature, can change the failure rate within that life span. The product design, MTBF calculations and even real world failure rates may not take into account the specific environmental conditions at a particular plant. Anecdotal events like the one described above might not have anything to do with reliability of the parts, but it can certainly impact ones confidence. Without knowing the root cause of why those two events happened, and that operation was restored by reseating the card, we don't know if it was a software issue, a degraded connection that was "refreshed" by reseating, or a power glitch that hung up a component, or hardware issue in the card.

    Theoretically, the probability of failure remains constant through out the operational life, but is expected to rise when that time passes, though gradually. So if your hardware has been running for 18 years, you should be preparing your maintenance budget to handle purchase of replacement hardware over the next few years. There is no need to panic or need to replace all at once, but if you run past the 20 years, you should not be surprised to see an increase in failure rates.

    - Do you get early signs...? There are no documented early signs. Since a failure can be seen 1 day, 1 year, or 10 years after installation, a single failure is not significant, especially if other failures have occurred over the life of the system. If you see an increase in a specific card type, like the H1 card, that would be more significant than say two failures on two different cards. Ultimately, it is a matter of confidence, and understanding if the failure might have been influenced by outside disruptions in power or transients.

    - Any Tests for reveal "weak" components? No. Could there be "weakened" components? yes. Electrical transients may hit an electronic component and damage a circuit but not fail it. Subsequent similar transients will tend to hit the same location. Eventually, it may fail. Unless you are detecting such transients, there is no way to know if any part is compromised. I would think that if you took a new card and a used old card and subjected both to a series of tests until one failed, you still would only be evaluating those specific components and the result would not represent the population.

    -Failure Modes? By definition the MTTF of a component like an IO card is based on the individual parts and the "weaker" parts are more likely to fail first. However, the individual MTBF of these parts is massively larger than the MTBF of the assembly. The parts do not fail in order of their individual MTTF.

    The only measure of probability is from the behavior of a broad population under similar operating conditions. I would expect the past failure rate of parts at your site is the most accurate indication of that to expect over the life of the product. And if the operating conditions have not changed, you should expect the same rate of failures.

    Will you be operating for 10 or 20 more years, or is the plant nearing end of life as well? You might want to evaluate your spares inventory and purchase some additional cards for replacement of critical IO. Is an outage planned? Can you replace other cards while operating? Your system life plan should be looking to answer these questions and be preparing the appropriate replacement strategy.

    The replacement cost of the IO cards does not include licensing, so you are not repurchasing the entire system when rejuvenating the hardware after 20 years. Also, the current M-series hardware is backward compatible to older versions of software, just as the older hardware still works with newer releases. M-series may be 20 years old, but inside the plastic, all the electronics are new cards. You can simply replace the cards without having to modify any configurations. The new features of the cards will become available with new versions of DeltaV. You can talk with your Emerson Sales or LBP office for more details, but M-series hardware is not going away.

    There is one thing to know: The DI 24 VDC Dry Contact 8 channel card is electrically different than the Series 1 DI24 VDC Dry Contact card, as shown in BOL. It is a drop in replacement, but only if the IO has been wired per the supported documentation(i.e. with a pair of wires to the contact for each channel). The Series 1 card is a Low Side sense card, while the Series 2 is implemented as a high side sense with Line fault detection features.

    If you have discrete IO wired such that power is taken directly to the field contacts and a single wire is returned to the control system DI Channel, you can consider replacing the Series 1 DI 24 VDC Dry Contact card with the Series 2 DI 24 VDC Isolated card. You must rewire the terminal block with the field signal wires connected to the Odd numbered terminals, and then collect the Even terminals and connect them to signal common. This will also require you to reconfigure the IO card, which means reassigning the DSTs back to their appropriate channel. The result is a low side sense circuit using the Isolated DI channel.

    Andre Dicaire

  • In reply to Andre Dicaire:

    Andre I hope I retire before you move on. Thanks for sharing - it's great to have someone on the forum with such astute insight and advice.

    And I hope your successor is studying hard . . .
  • In reply to Andre Dicaire:

    Hello,

    Recently we have had two MD/MDPlus controllers fail, both with under 10 years of operational service. One was in service for 6 years and the other for 7 years. Both controllers are housed in panels located in data centre type conditions. The power supplies are fed from a UPS.

    Is there any experience on the form where controller products have failed well under the expected 20 year operational life?

    Happy New year to all.

    Regards
    M
  • In reply to MichaelP:

    What was the nature of the failure? Did they just stop? Did they fail over to a redundant partner?

    Are the power supplies of the same vintage?

    When we replaced power supplies during recent outage we noticed the "new" (dual voltage - 12 and 24 V) controller power supply would not "green light" - found that the bulk 12V power coming from old VE2014 F-R power supplies was only around 11V as seen at the terminals of the controller power supply. The older power supplies were happy (?) to run on the reduced voltage I guess.

    Somewhere back around 2003 / 04 we managed to get the controller loading on some 4 year old "M5+" controllers up above 90%. One day they locked up in the midst of a partial download. Aside from that instance where we had clearly exceeded the F-R recommendations we haven't seen any issues. The M5's were replaced with MD's and the only redundancy fail overs have been when we force them during upgrades. That's through versions 6-7-9-10-11 and 13.
  • In reply to John Rezabek:

    The reason is unkown at present. We are looking into this Emerson.
    Stand along controller.
    Power supplies would be of the same age.

    Good tip with the voltage level, we will investigate this aspect.
  • In reply to MichaelP:

    *with Emerson (typo)
  • In reply to MichaelP:

    If it was due to voltage level, the Controller should still function if installed in a test system.

    The DeltaV System Power supply converts the incoming 24 VDC or 12 VDC bulk supply voltage to a 5 VDC level for the controller. A dip in voltage is detected by the System power supply and will shutdown the controller, storing key power fail data for restart. So a dip in voltage will not damage the controller, and it will result in a log entry in the telnet logs of the controller.

    Also, since the 5 VDC is separate from the 12 VDC that feeds the IO cards, it is possible for cards to continue to see power but the controller not if the power supply is the root of the issue.

    So replacing the power supply and/or checking the bulk voltage would be a good start. There is no way to check the 5 volt output to the controller, other than replacing parts.

    Andre Dicaire

  • In reply to Andre Dicaire:

    To bring some feedback.
    Might ring a bell or jolt an idea with you.....

    The controller that crashed was set to recover using ColdRestart. This function did not work.
    The site power supply (24VDC) to the controller is backed up through two independent systems.
    We have since data logged the 24VDC voltage to the controller power supply and, while it is after the event, its shows good health.
    Power cycle tests on the controller in a test environment cannot repeat the crash. Each time the controller recovers using ColdRestart.
    Emerson are non-the-wiser and suspect an anomaly with the 24VDC power to the controller power supply.

    Below are the telenet events:
    - Event 4 is last full download (to put the following events in time line context).
    - Event 5 is the time of the crash*
    - Event 6 is the crash* state the controller remained in until engineer intervention.
    - Event 7 is engineer intervention (manual download via DV Explorer)

    *crash = no configuration in the controller.


    Event entry 4 of 128 (max is 128)
    Jan 12, 2017 09:39:25 RtProgLog, Severity = RtWarning
    Role = Active
    Task = UBKO
    File = RtProxyClient.cpp
    Line = 586
    Message = No unsol data has been received for 99780000 seconds.
    Forcing resync on module "PU
    [P]revious [N]ext [E]xit : n

    Event entry 5 of 128 (max is 128)
    Dec 27, 2017 23:24:09 RtProgLog, Severity = RtWarning
    Role = No Commitment
    Task = ROOT
    File = /Start.cpp
    Line = 462
    Message = Controller Bootup Starting.
    status = 00000003
    [P]revious [N]ext [E]xit : n

    Event entry 6 of 128 (max is 128)
    Dec 27, 2017 23:24:09 RtProgLog, Severity = RtWarning
    Role = No Commitment
    Task = UPMG
    File = uptarget.cpp
    Line = 228
    Message = Commission Data: V=0x434f4d35 PE=1 PA=10.4.0.150 SUB=255.254.0.0
    status = 000
    [P]revious [N]ext [E]xit : n

    Event entry 7 of 128 (max is 128)
    Dec 28, 2017 00:08:31 RtProgLog, Severity = RtWarning
    Role = Active
    Task = DNLD
    File = RtDevice.cpp
    Line = 9492
    Message = Processing DEVICE DownloadRequest Success
    status = 00000000
    [P]revious [N]ext [E]xit : n


    Regards
    M.