With 300 MB event chronicles turning over ever 10-14 days, had a look and I'm seeing several hundred (200-500 or more perhaps) instances of "IO Input Failure" followed by "Error Cleared". They are all "Events" and all "4-INFO" and are not spawning any "Bad Quality" alarms in DeltaV Operate. Various blocks are indicated as experiencing the failure - AI blocks, ALM blocks, ISEL, occasionally a PID. Some are associated with bus IO (Fieldbus and Modbus) but not all . . .
Anyone else seeing this? I have not yet deployed the latest 14.3.1 controller hotfix but I saw nothing in the issues addressed that looked like mine. Most of these modules have been around for a long time - some of them since at least v 4 (ca. 2000). This is the first I have seen this volume of I/O input failures on an ongoing basis.
Would someone changed the execution rate of the module by any chance?
In reply to Youssef.El-Bahtimy:
Note that most times system owners are only focused on alarm quantification as part of their situational awareness plan, but as you point out, chattering I/O and other non-annunciation events should still be evaluated to uncover underlying performance issues. Typically, my clients slough the A&E chronicles to the Batch Historian to aggregate the short-term chronicle SQL to the long(er) term batch history SQL, or even PI and Aspen IP21 historians for longest term querying. I hit those systems with a counting query on a regular basis to get faster identification of these sorts of noise floods that aren't annunciating (until they fill up the storage).
In reply to Lun.Raznik:
In reply to Mark Bendele:
Well this is a huge clue.
I am assuming that at one point your system was running DeltaV 11.3.x? The default IO scan rate for CHARMs was fixed (if I remember this correctly) in 11.3.x to 50ms. This was changed to 250ms in v13.3.1 (or maybe even before that release). So, if you happen to have modules that are running at say 100ms then you will be getting IO Failure in newer versions.
Can you please check how CIOCs are configured?
Bummer :).
Maybe we can try this, let us focus on a module (one module), you have to pick one:
The change in CIOC update rates occurred in 11.3.1. The original 50 ms scan rate caused issues with the CPU loading on SDPlus controllers and offering a slower update rate on the CIOC allowed the SDPlus to handle a reasonable number of CIOC's to accommodate 750 IO. The CIOC update rate has been configurable since 11.3.1 with a default of 250 ms.
Luz is correct in suggesting that you focus on one module and investigate it more completely. Best to fully understand one source of the errors before we try to suggest a fix.
900 IO events over two weeks does not seem to me to be accounting for the consumption of a Data Set of 300 MB. On my system I have a 100 MB Ejournal active database, currently showing 81 MB used, 19 MB free. A look in PHV shows slightly more than 29000 total records, which indicate each record takes about 2.5 KB A 300 MB data set should therefore accommodate well over 100,000 events. Seems there might be more to the story than just I/O Input errors.
At this point, I would ask that you post a few examples of the Ejournal events, just to clarify that we are dealing with an IO Input or an IO Output error, or if it is a Input Transfer Error or Output Transfer Error. On my system I have an IO Input error and it indicates IOF and General IO Error in the description fields. I tried posting a screen capture but it is not legible in this post.
IO Input error on my system is caused by an open loop condition on the AI CHARM. so it is latched and not chattering.
Luz mentioned that Module Execution scan rate can cause IO errors and that the module must not execute faster than the IO updates. Specifically, the Module must not write to an Output channel faster than the IO subsystem can process the output to the Output channel. Module execution rates will affect Output channels only in this way. A module can read an input channel faster than the Input is updating and this does not cause an Input error.
It is certainly good practice to have modules write at a slower rate than the IO subsystem can process the signals. Note that if the output value has not changed, an output error will not be registered even if the module executes faster than the IO update rate. This can account for sporadic Output errors. The issue is typically seen with analog values more so than discrete values that change infrequently.
You can also use Diagnostic explorer to see the current state of MERROR and MSTATUS for all modules in a controller. You can sort on the IOINerr and IOOUTerr to find all modules with input or output errors. Select the Assigned Modules container and in the right hand pane you can see all the individual bits of the MERRO and MSTATUS words. This is a good place to view the overall health of your control modules and any online issues or abnormal conditions. Each row represents one module, which is identified in the first column.
Note, on our database, the Disparity Detect had been enabled on one transmitter and this was toggling an alarm that chattered every 4 to 6 seconds. It was generating 29000 events in one day, filling my Event Journal in about 30hours. I disabled this feature until I can determine what the root cause is. It is important to keep the Event Journal from being inundated with chattering events so that you can both maintain a longer time period of current data and not suffer performance issues when retrieving an hour or a days worth of data. Having 100 times more events means each query must process that much more data. A clean Ejournal helps PHV Echarts and Event views perform better.
If you only have 900 such events in your 300MB archive, the archive must not be very full. Use Event Chronicle Administrator to view the start time of the Archive. view Properties of the archive to view the Used and Free Space. Is the Archive filling in a couple of weeks, or is it simply a couple of weeks since this data set became active. View all events in a PHV event view and determine how many events are in your dataset. Then compare this to the used space. This will help determine how much time this data set will collect before switching to a new active Data set.
We still need to determine the cause of IO IN /OUT errors and address them. But at least you'll know what impact these events have on your dataset rollover period.
Andre Dicaire
In reply to Andre Dicaire: