• Not Answered

DeltaV using Data Compression for history collection

How does History Collection work together with Data Compression, e.g. what value is used to be compared with to decide whether to collect a value or not and is it the sampling time that decides how often the comparison should be done or...

3 Replies

  • Here's an old paper on data compression. The "The first-order predictor (FOP)" looks most like what I remember from old PI (OSIsoft) training, and might be close to what DeltaV uses.

    Your sampling rate sets how often the historian grabs a snapshot. Compression determines whether the snapshot is stored or tossed out. The idea is -  you can sample frequently, but consume less memory (hard drive storage) by only storing points that exceed your "compression" (deviation) setting.

    In my opinion, compression is less important since hard drives are so cheap and large. YMMV.

    There is also some information on compression in BOL.

  • In reply to John Rezabek:

    John, good information.

    Some additional notes. The compression algorithm holds onto the last stored value as the basis for calculating each subsequent sample. It also keeps the last sample. With each New sample, there are three data points to evaluate. The Deadband for compression allows the algorithm to create a parallelogram to determine if a value is within the compression range or represents significant change.

    The idea is that since a transmitter value has a stated accuracy range, collecting history more accurately than the signal can represent does not actually provide one with more actionable data. More data points does not help if they all represent the more or less the same value.

    In addition to the Deadband compression, you also have the max time between stored values. On a flat line, you would have no storage for hours or days at a time. If there is an abnormal shutdown of the computer, the buffered last values in the compression algorithm can be lost, and you have no data. The fact you have no data tells you it did not change since the last stored value, but that's not reassuring. So the max time forces a value into storage, even if it has not changed.

    If you store all the data all the time, you may not care about disk space, but it takes time to process the stored data on retrieval. having 100 times more data to represent a line of points rather than a one made of segments means it will take more time to view the data when called up, at least initially. Processors are faster too, but they are not improving by orders of magnitude like they used to. Multi core processors don't necessarily help make a single task faster.

    So compression helps. Consider having a Deadband of 0 and a max time of 15 minutes to 1 hour. Flat, straight lines get compressed while modulating lines get more data points, and never compress more than max time. A little compression can go a long way to having more optimized data storage that helps in retrieval and presentation.

    Speaking of data presentation, the PHV CHARTS have an optimization algorithm that takes a large number of data points and represents the line with Min max values over a couple hundred segments. If the data point count gets too high, the pixel resolution on the screen can't show the points anyway. Rather than loose visibility of a short spike, this algorithm ensures that you can see any significant change in the PV. As you zoom in, the line converts back to individual data points once the point count fits in the window. Sampling without compression can result in high point count per trend and forces this optimization to occur in a smaller time window.

    The Sample rate should be set based on the variable's time constant. Flows are fast while levels and temperature can be very slow. The historian has capacity to handle a few thousand values per second. Oversampling can limit the number of tags collected before the scanner is overloaded, which will result in missed data. Sampling is the most important setting as it limits the number of tags. For 30K tags, the average sample rate will be around 10 seconds (3k values per second collection).

    Compression helps reduce disk space which means a given History Data Set can hold more information (has a longer time window). The DeltaV Continuous Historian has a 10Gb limit on the automatic Data set area. It will export the oldest current data set and delete it to make room for the new Active Dataset. 10Gb of data with compression can be significantly longer time frame than if no compression is used. However, the exported Dataset can be used to create an Extended DataSet, which you can register and make available. So you are not limited to 10 GB of data on the Continuous Historian, but you do have to perform an admin task to create and register those Extended Data sets.

    The Advanced Historian, which is a PI historian has a more efficient scanner and can push more data into the data archives per second. v14 BOL states it can handle 10,000 samples/second. It can also be licensed to 60K tags. The Advance historian does not have a limit of current data archives and can be set up to continually create new archives or be set to rotate with a set of archive files, clearing and reusing the oldest archive when the current one fills. It does not require you to manually create Extended archives, but you still want to manage your Backups.

    ACH has a similar compression function and configuration for both ACH and CH are the same.

    I think compression is useful and the need for it does not completely go away with cheap storage. For discrete or manually changed values, a compression of 0 Deadband allows for collection of values to capture when they change, but not fill the archive with unchanging Boolean or Integer values. Adjust Sample and Max time and compression Deadband to help keep data storage manageable without wasting Disk space, network bandwidth, Client side processing/delay and keep data responsive when it is used.

    Disk storage costs means we don't have to be as particular as we used to be, but some compression helps in retrieval. Also consider if you end up moving the history data from the local server to the Enterprise, Wasted storage can cost in moving the data up, in the form of latency and or bandwidth consumption.

    Cheers.

    Andre Dicaire