Simple exampleΒΆ

A simple example is included in the examples/simple directory. This example uses data from an excel file, simple.xlsx, which contains 4 columns of data (A through D).

  • A = elapsed time in days
  • B = uniform random number between 0 and 1
  • C = sin(10*A)
  • D = C+(B-0.5)/2

The data includes missing timestamps, duplicate timestamps, non-monotonic timestamps, corrupt data, data out of expected range, data that doesn’t change, and data that changes abruptly.

  • Missing timestamp at 5:00
  • Duplicate timestamp 17:00
  • Non-monotonic timestamp 19:30
  • Column A has the same value (0.5) from 12:00 until 14:30
  • Column B is below the expected lower bound of 0 at 6:30 and above the expected upper bound of 1 at 15:30
  • Column C has corrupt data (-999) between 7:30 and 9:30
  • Column C does not follow the expected sine function from 13:00 until 16:15. The change is abrupt and gradually corrected.
  • Column D is missing data from 17:45 until 18:15
  • Column D is occasionally below the expected lower bound of -1 around midday (2 timesteps) and above the expected upper bound of 1 in the early morning and late evening (10 timesteps).

The script, simple_example.py (shown below), is used to run quality control analysis using Pecos. The script performs the following steps:

  • Define input for quality control tests, including
    • Expected frequency of the timestamp
    • Time filter to exclude data points early and late in the day
    • Corrupt data values
    • Upper and lower bounds for data range and data increments
    • sine wave model to compute measurement error
  • Load time series data from an excel file
  • Run quality control tests
  • Generate an HTML report, test results CSV file, and performance metrics CSV file
"""
In this example, simple time series data is used to demonstrate basic functions
in pecos.  
* Data is loaded from an excel file which contains four columns of values that 
  are expected to follow linear, random, and sine models.
* A translation dictionary is defined to map and group the raw data into 
  common names for analysis
* A time filter is established to screen out data between 3 AM and 9 PM
* The data is loaded into a pecos PerformanceMonitoring object and a series of 
  quality control tests are run, including range tests and increment tests 
* The results are printed to csv and html reports
"""
import pecos
import pandas as pd
import matplotlib.pyplot as plt
import os
import numpy as np

# Initialize logger
pecos.logger.initialize()

# Create a Pecos PerformanceMonitoring data object
pm = pecos.monitoring.PerformanceMonitoring()

# Populate the object with a dataframe and translation dictionary
system_name = 'Simple'
data_file = 'simple.xlsx'
df = pd.read_excel(data_file)
translation_dictionary = {
    'Linear': ['A'],
    'Random': ['B'],
    'Wave': ['C','D']}
pm.add_dataframe(df, system_name)
pm.add_translation_dictionary(translation_dictionary, system_name)

# Check timestamp
pm.check_timestamp(900)
 
# Generate a time filter
clock_time = pm.get_clock_time()
time_filter = (clock_time > 3*3600) & (clock_time < 21*3600)
pm.add_time_filter(time_filter)

# Check missing
pm.check_missing()
        
# Check corrupt
pm.check_corrupt([-999]) 

# Add composite signals
elapsed_time= pm.get_elapsed_time()
wave_model = np.sin(10*(elapsed_time/86400))
wave_model.columns=['Wave Model']
pm.add_signal('Wave Model', wave_model)
wave_model_abs_error = np.abs(np.subtract(pm.df[pm.trans['Wave']], wave_model))
wave_model_abs_error.columns=['Wave Absolute Error C', 'Wave Absolute Error D']
pm.add_signal('Wave Absolute Error', wave_model_abs_error)

# Check range
pm.check_range([0, 1], 'Random')
pm.check_range([-1, 1], 'Wave')
pm.check_range([None, 0.25], 'Wave Absolute Error')

# Check increment
pm.check_increment([0.0001, None], 'Linear') 
pm.check_increment([0.0001, None], 'Random') 
pm.check_increment([0.0001, 0.6], 'Wave') 
    
# Compute metrics
mask = pm.get_test_results_mask()
QCI = pecos.metrics.qci(mask, pm.tfilter)

# Define output file names and directories
results_directory = 'Results'
if not os.path.exists(results_directory):
    os.makedirs(results_directory)
graphics_file_rootname = os.path.join(results_directory, 'test_results')
custom_graphics_file = os.path.abspath(os.path.join(results_directory, 'custom.png'))
metrics_file = os.path.join(results_directory, system_name + '_metrics.csv')
test_results_file = os.path.join(results_directory, system_name + '_test_results.csv')
report_file =  os.path.join(results_directory, system_name + '.html')

# Generate graphics
test_results_graphics = pecos.graphics.plot_test_results(graphics_file_rootname, pm)
plt.figure(figsize = (7.0,3.5))
ax = plt.gca()
df.plot(ax=ax, ylim=[-1.5,1.5])
plt.savefig(custom_graphics_file, format='png', dpi=500)

# Write metrics, test results, and report files
pecos.io.write_metrics(metrics_file, QCI)
pecos.io.write_test_results(test_results_file, pm.test_results)
pecos.io.write_monitoring_report(report_file, pm, test_results_graphics, [custom_graphics_file], QCI)
                                 

Results are saved in examples/simple/Results. Results include:

  • HTML report, Simple.html (shown below), includes summary tables and graphics
  • Test results CSV file, Simple_test_results.csv, includes information from the summary tables
  • Performance metric CSV file, Simple_metrics.csv, includes a quality control index based on the analysis.
Pecos Monitoring Report

Pecos Monitoring Report

Start time: 2015-01-01 00:00:00
End time: 2015-01-01 23:45:00
Test Failures: 17
Notes: 0

Image not loaded

Performance Metrics:

Quality Control Index
2015-01-01 0.871227

Test Results:

System Name Variable Name Start Date End Date Timesteps Error Flag
1 2015-01-01 19:30:00 2015-01-01 19:30:00 1.0 Nonmonotonic timestamp
2 2015-01-01 17:00:00 2015-01-01 17:00:00 1.0 Duplicate timestamp
3 2015-01-01 05:00:00 2015-01-01 05:00:00 1.0 Missing timestamp
4 Wave Absolute Error C 2015-01-01 13:00:00 2015-01-01 14:45:00 8.0 Data > upper bound, 0.25
5 Simple A 2015-01-01 12:15:00 2015-01-01 14:30:00 10.0 Increment < lower bound, 0.0001
6 Simple B 2015-01-01 06:30:00 2015-01-01 06:30:00 1.0 Data < lower bound, 0
7 Simple B 2015-01-01 15:30:00 2015-01-01 15:30:00 1.0 Data > upper bound, 1
8 Simple C 2015-01-01 07:30:00 2015-01-01 09:30:00 9.0 Corrupt data
9 Simple C 2015-01-01 13:00:00 2015-01-01 13:00:00 1.0 Increment > upper bound, 0.6
10 Simple D 2015-01-01 17:45:00 2015-01-01 18:15:00 3.0 Missing data
11 Simple D 2015-01-01 11:15:00 2015-01-01 11:15:00 1.0 Data < lower bound, -1
12 Simple D 2015-01-01 12:45:00 2015-01-01 12:45:00 1.0 Data < lower bound, -1
13 Simple D 2015-01-01 03:15:00 2015-01-01 03:30:00 2.0 Data > upper bound, 1
14 Simple D 2015-01-01 04:00:00 2015-01-01 04:00:00 1.0 Data > upper bound, 1
15 Simple D 2015-01-01 04:30:00 2015-01-01 04:45:00 2.0 Data > upper bound, 1
16 Simple D 2015-01-01 18:30:00 2015-01-01 18:45:00 2.0 Data > upper bound, 1
17 Simple D 2015-01-01 19:15:00 2015-01-01 19:45:00 3.0 Data > upper bound, 1

Image not loaded
Image not loaded
Image not loaded
Image not loaded
Image not loaded

Notes:

None


This report was generated by Pecos 0.1.1, 05/06/2016