Dear All,
We are facing a Watchdog failure problem at one of our customer site, as soon as this watchdog timer fails the entire batch goes to hold thus stopping the entire production. The entire system uses dynamic referencing to issue different commands. Controller Free time and Free memory seems good when issue happens. If anyone came across such problem or have found a way out or alternative please kindly share. Thank you very much.
As the general community does not have access to the GSC call tracking system, I would suggest you post the details of your issue here.
If the problem pertains to a customer that you are the service provider for (based on the title of the posting), please make sure not to post any sensitive information.
In reply to Youssef.El-Bahtimy:
I agree with Youssef.
But on general terms - watchdog failure at the node level can be caused by seveal things.
On top of my head:
- Loading on the node (for the controller is it within the specified limits)? Like
Controller free time minimum: 20%
Controller free memory minimum
MD - 1.4 MB
SD Plus/MD Plus - 4.8 MB
SX/MX - 9.6 MB
- For workstations - check that you have enough CPU idle, free memory, Disk IO load
- At network level - is it clean traffic-wise? Are the network equipments in good shape
On the control level, it is possible that improvements/fixes may have exposed issues in you configuration. Or there might be regression in functionality.
In reply to Ashish P:
Sorry for delay just want to update some information on this problem :
The issue we are facing is different one….. THIS IS NOT SERIAL WATCHDOG… this is batch watchdog
Your phase is loaded to controller…. While running if it loses connection with batch executive, after certain time your batch fails with a reason as ‘Phase Logic Failure: PLM Watchdog Failed 1’ or "Device connection Error " etc.
Has anyone observed this kind of situation earlier ? Any details or information will be really helpful.
Thank you for your time gentlemen !!
You mentioned that you use a lot of dynamic referencing. I have had experience with module's with dynamic references embedded in phases causing problems because the module execution would start before references were bound.
Perhaps take a look at networking during phase loading to see if utilization due to excessive dynamic reference binding spikes enough, interefering with batch exec watchdog function.
I would look into whether you can modify the watchdog time out. For soft phases I believe you can, but I don't think that applies to controller phases.
Dear Youssef,
As you mentioned that module's with dynamic references embedded in phases causing problems because the module execution would start before references were bound, but this problem is observed only in a particular phase. Not all the phases cause the PLM watchdog Failure.
Whether Controller Memory fragmentation has any impact on watchdog failure ?
Client considered the option of hardware upgrade to solve this problem. Now client will replace all the existing controllers with MX.