Simple example¶

A simple example is included in the examples/simple directory. This example uses data from an excel file, simple.xlsx, which contains 4 columns of data (A through D).

A = elapsed time in days
B = uniform random number between 0 and 1
C = sin(10*A)
D = C+(B-0.5)/2

The data includes missing timestamps, duplicate timestamps, non-monotonic timestamps, corrupt data, data out of expected range, data that doesn’t change, and data that changes abruptly.

Missing timestamp at 5:00
Duplicate timestamp 17:00
Non-monotonic timestamp 19:30
Column A has the same value (0.5) from 12:00 until 14:30
Column B is below the expected lower bound of 0 at 6:30 and above the expected upper bound of 1 at 15:30
Column C has corrupt data (-999) between 7:30 and 9:30
Column C does not follow the expected sine function from 13:00 until 16:15. The change is abrupt and gradually corrected.
Column D is missing data from 17:45 until 18:15
Column D is occasionally below the expected lower bound of -1 around midday (2 timesteps) and above the expected upper bound of 1 in the early morning and late evening (10 timesteps).

The script, simple_example.py (shown below), is used to run quality control analysis using Pecos. The script performs the following steps:

Define input for quality control tests, including
- Expected frequency of the timestamp
- Time filter to exclude data points early and late in the day
- Corrupt data values
- Upper and lower bounds for data range and data increments
- sine wave model to compute measurement error
Load time series data from an excel file
Run quality control tests
Generate an HTML report, test results CSV file, and performance metrics CSV file

"""
In this example, simple time series data is used to demonstrate basic functions
in pecos.  
* Data is loaded from an excel file which contains four columns of values that 
  are expected to follow linear, random, and sine models.
* A translation dictionary is defined to map and group the raw data into 
  common names for analysis
* A time filter is established to screen out data between 3 AM and 9 PM
* The data is loaded into a pecos PerformanceMonitoring object and a series of 
  quality control tests are run, including range tests and increment tests 
* The results are printed to csv and html reports
"""
import pecos
import pandas as pd
import matplotlib.pyplot as plt
import os
import numpy as np

# Initialize logger
pecos.logger.initialize()

# Create a Pecos PerformanceMonitoring data object
pm = pecos.monitoring.PerformanceMonitoring()

# Populate the object with a dataframe and translation dictionary
system_name = 'Simple'
data_file = 'simple.xlsx'
df = pd.read_excel(data_file)
translation_dictionary = {
    'Linear': ['A'],
    'Random': ['B'],
    'Wave': ['C','D']}
pm.add_dataframe(df, system_name)
pm.add_translation_dictionary(translation_dictionary, system_name)

# Check timestamp
pm.check_timestamp(900)
 
# Generate a time filter
clock_time = pm.get_clock_time()
time_filter = (clock_time > 3*3600) & (clock_time < 21*3600)
pm.add_time_filter(time_filter)

# Check missing
pm.check_missing()
        
# Check corrupt
pm.check_corrupt([-999]) 

# Add composite signals
elapsed_time= pm.get_elapsed_time()
wave_model = np.sin(10*(elapsed_time/86400))
wave_model.columns=['Wave Model']
pm.add_signal('Wave Model', wave_model)
wave_model_abs_error = np.abs(np.subtract(pm.df[pm.trans['Wave']], wave_model))
wave_model_abs_error.columns=['Wave Absolute Error C', 'Wave Absolute Error D']
pm.add_signal('Wave Absolute Error', wave_model_abs_error)

# Check range
pm.check_range([0, 1], 'Random')
pm.check_range([-1, 1], 'Wave')
pm.check_range([None, 0.25], 'Wave Absolute Error')

# Check increment
pm.check_increment([0.0001, None], 'Linear') 
pm.check_increment([0.0001, None], 'Random') 
pm.check_increment([0.0001, 0.6], 'Wave') 
    
# Compute metrics
mask = pm.get_test_results_mask()
QCI = pecos.metrics.qci(mask, pm.tfilter)

# Define output file names and directories
results_directory = 'Results'
if not os.path.exists(results_directory):
    os.makedirs(results_directory)
graphics_file_rootname = os.path.join(results_directory, 'test_results')
custom_graphics_file = os.path.abspath(os.path.join(results_directory, 'custom.png'))
metrics_file = os.path.join(results_directory, system_name + '_metrics.csv')
test_results_file = os.path.join(results_directory, system_name + '_test_results.csv')
report_file =  os.path.join(results_directory, system_name + '.html')

# Generate graphics
test_results_graphics = pecos.graphics.plot_test_results(graphics_file_rootname, pm)
plt.figure(figsize = (7.0,3.5))
ax = plt.gca()
df.plot(ax=ax, ylim=[-1.5,1.5])
plt.savefig(custom_graphics_file, format='png', dpi=500)

# Write metrics, test results, and report files
pecos.io.write_metrics(metrics_file, QCI)
pecos.io.write_test_results(test_results_file, pm.test_results)
pecos.io.write_monitoring_report(report_file, pm, test_results_graphics, [custom_graphics_file], QCI)
                                 

Results are saved in examples/simple/Results. Results include:

HTML report, Simple.html (shown below), includes summary tables and graphics
Test results CSV file, Simple_test_results.csv, includes information from the summary tables
Performance metric CSV file, Simple_metrics.csv, includes a quality control index based on the analysis.

Pecos Monitoring Report

Start time: 2015-01-01 00:00:00
End time: 2015-01-01 23:45:00
Test Failures: 17
Notes: 0

Performance Metrics:

	Quality Control Index
2015-01-01	0.871227

Test Results:

	System Name	Variable Name	Start Date	End Date	Timesteps	Error Flag
1			2015-01-01 19:30:00	2015-01-01 19:30:00	1.0	Nonmonotonic timestamp
2			2015-01-01 17:00:00	2015-01-01 17:00:00	1.0	Duplicate timestamp
3			2015-01-01 05:00:00	2015-01-01 05:00:00	1.0	Missing timestamp
4		Wave Absolute Error C	2015-01-01 13:00:00	2015-01-01 14:45:00	8.0	Data > upper bound, 0.25
5	Simple	A	2015-01-01 12:15:00	2015-01-01 14:30:00	10.0	Increment < lower bound, 0.0001
6	Simple	B	2015-01-01 06:30:00	2015-01-01 06:30:00	1.0	Data < lower bound, 0
7	Simple	B	2015-01-01 15:30:00	2015-01-01 15:30:00	1.0	Data > upper bound, 1
8	Simple	C	2015-01-01 07:30:00	2015-01-01 09:30:00	9.0	Corrupt data
9	Simple	C	2015-01-01 13:00:00	2015-01-01 13:00:00	1.0	Increment > upper bound, 0.6
10	Simple	D	2015-01-01 17:45:00	2015-01-01 18:15:00	3.0	Missing data
11	Simple	D	2015-01-01 11:15:00	2015-01-01 11:15:00	1.0	Data < lower bound, -1
12	Simple	D	2015-01-01 12:45:00	2015-01-01 12:45:00	1.0	Data < lower bound, -1
13	Simple	D	2015-01-01 03:15:00	2015-01-01 03:30:00	2.0	Data > upper bound, 1
14	Simple	D	2015-01-01 04:00:00	2015-01-01 04:00:00	1.0	Data > upper bound, 1
15	Simple	D	2015-01-01 04:30:00	2015-01-01 04:45:00	2.0	Data > upper bound, 1
16	Simple	D	2015-01-01 18:30:00	2015-01-01 18:45:00	2.0	Data > upper bound, 1
17	Simple	D	2015-01-01 19:15:00	2015-01-01 19:45:00	3.0	Data > upper bound, 1

Notes:

None

This report was generated by Pecos 0.1.1, 05/06/2016