Simple exampleΒΆ
A simple example is included in the examples/simple directory. This example uses data from an excel file, simple.xlsx, which contains 4 columns of data (A through D).
- A = elapsed time in days
- B = uniform random number between 0 and 1
- C = sin(10*A)
- D = C+(B-0.5)/2
The data includes missing timestamps, duplicate timestamps, non-monotonic timestamps, corrupt data, data out of expected range, data that doesn’t change, and data that changes abruptly.
- Missing timestamp at 5:00
- Duplicate timestamp 17:00
- Non-monotonic timestamp 19:30
- Column A has the same value (0.5) from 12:00 until 14:30
- Column B is below the expected lower bound of 0 at 6:30 and above the expected upper bound of 1 at 15:30
- Column C has corrupt data (-999) between 7:30 and 9:30
- Column C does not follow the expected sine function from 13:00 until 16:15. The change is abrupt and gradually corrected.
- Column D is missing data from 17:45 until 18:15
- Column D is occasionally below the expected lower bound of -1 around midday (2 timesteps) and above the expected upper bound of 1 in the early morning and late evening (10 timesteps).
The script, simple_example.py (shown below), is used to run quality control analysis using Pecos. The script performs the following steps:
- Define input for quality control tests, including
- Expected frequency of the timestamp
- Time filter to exclude data points early and late in the day
- Corrupt data values
- Upper and lower bounds for data range and data increments
- sine wave model to compute measurement error
- Load time series data from an excel file
- Run quality control tests
- Generate an HTML report, test results CSV file, and performance metrics CSV file
"""
In this example, simple time series data is used to demonstrate basic functions
in pecos.
* Data is loaded from an excel file which contains four columns of values that
are expected to follow linear, random, and sine models.
* A translation dictionary is defined to map and group the raw data into
common names for analysis
* A time filter is established to screen out data between 3 AM and 9 PM
* The data is loaded into a pecos PerformanceMonitoring object and a series of
quality control tests are run, including range tests and increment tests
* The results are printed to csv and html reports
"""
import pecos
import pandas as pd
import matplotlib.pyplot as plt
import os
import numpy as np
# Initialize logger
pecos.logger.initialize()
# Create a Pecos PerformanceMonitoring data object
pm = pecos.monitoring.PerformanceMonitoring()
# Populate the object with a dataframe and translation dictionary
system_name = 'Simple'
data_file = 'simple.xlsx'
df = pd.read_excel(data_file)
translation_dictionary = {
'Linear': ['A'],
'Random': ['B'],
'Wave': ['C','D']}
pm.add_dataframe(df, system_name)
pm.add_translation_dictionary(translation_dictionary, system_name)
# Check timestamp
pm.check_timestamp(900)
# Generate a time filter
clock_time = pm.get_clock_time()
time_filter = (clock_time > 3*3600) & (clock_time < 21*3600)
pm.add_time_filter(time_filter)
# Check missing
pm.check_missing()
# Check corrupt
pm.check_corrupt([-999])
# Add composite signals
elapsed_time= pm.get_elapsed_time()
wave_model = np.sin(10*(elapsed_time/86400))
wave_model.columns=['Wave Model']
pm.add_signal('Wave Model', wave_model)
wave_model_abs_error = np.abs(np.subtract(pm.df[pm.trans['Wave']], wave_model))
wave_model_abs_error.columns=['Wave Absolute Error C', 'Wave Absolute Error D']
pm.add_signal('Wave Absolute Error', wave_model_abs_error)
# Check range
pm.check_range([0, 1], 'Random')
pm.check_range([-1, 1], 'Wave')
pm.check_range([None, 0.25], 'Wave Absolute Error')
# Check increment
pm.check_increment([0.0001, None], 'Linear')
pm.check_increment([0.0001, None], 'Random')
pm.check_increment([0.0001, 0.6], 'Wave')
# Compute metrics
mask = pm.get_test_results_mask()
QCI = pecos.metrics.qci(mask, pm.tfilter)
# Define output file names and directories
results_directory = 'Results'
if not os.path.exists(results_directory):
os.makedirs(results_directory)
graphics_file_rootname = os.path.join(results_directory, 'test_results')
custom_graphics_file = os.path.abspath(os.path.join(results_directory, 'custom.png'))
metrics_file = os.path.join(results_directory, system_name + '_metrics.csv')
test_results_file = os.path.join(results_directory, system_name + '_test_results.csv')
report_file = os.path.join(results_directory, system_name + '.html')
# Generate graphics
test_results_graphics = pecos.graphics.plot_test_results(graphics_file_rootname, pm)
plt.figure(figsize = (7.0,3.5))
ax = plt.gca()
df.plot(ax=ax, ylim=[-1.5,1.5])
plt.savefig(custom_graphics_file, format='png', dpi=500)
# Write metrics, test results, and report files
pecos.io.write_metrics(metrics_file, QCI)
pecos.io.write_test_results(test_results_file, pm.test_results)
pecos.io.write_monitoring_report(report_file, pm, test_results_graphics, [custom_graphics_file], QCI)
Results are saved in examples/simple/Results. Results include:
- HTML report, Simple.html (shown below), includes summary tables and graphics
- Test results CSV file, Simple_test_results.csv, includes information from the summary tables
- Performance metric CSV file, Simple_metrics.csv, includes a quality control index based on the analysis.
Pecos Monitoring Report
Start time: 2015-01-01 00:00:00End time: 2015-01-01 23:45:00
Test Failures: 17
Notes: 0
Performance Metrics:
Quality Control Index | |
---|---|
2015-01-01 | 0.871227 |
Test Results:
System Name | Variable Name | Start Date | End Date | Timesteps | Error Flag | |
---|---|---|---|---|---|---|
1 | 2015-01-01 19:30:00 | 2015-01-01 19:30:00 | 1.0 | Nonmonotonic timestamp | ||
2 | 2015-01-01 17:00:00 | 2015-01-01 17:00:00 | 1.0 | Duplicate timestamp | ||
3 | 2015-01-01 05:00:00 | 2015-01-01 05:00:00 | 1.0 | Missing timestamp | ||
4 | Wave Absolute Error C | 2015-01-01 13:00:00 | 2015-01-01 14:45:00 | 8.0 | Data > upper bound, 0.25 | |
5 | Simple | A | 2015-01-01 12:15:00 | 2015-01-01 14:30:00 | 10.0 | Increment < lower bound, 0.0001 |
6 | Simple | B | 2015-01-01 06:30:00 | 2015-01-01 06:30:00 | 1.0 | Data < lower bound, 0 |
7 | Simple | B | 2015-01-01 15:30:00 | 2015-01-01 15:30:00 | 1.0 | Data > upper bound, 1 |
8 | Simple | C | 2015-01-01 07:30:00 | 2015-01-01 09:30:00 | 9.0 | Corrupt data |
9 | Simple | C | 2015-01-01 13:00:00 | 2015-01-01 13:00:00 | 1.0 | Increment > upper bound, 0.6 |
10 | Simple | D | 2015-01-01 17:45:00 | 2015-01-01 18:15:00 | 3.0 | Missing data |
11 | Simple | D | 2015-01-01 11:15:00 | 2015-01-01 11:15:00 | 1.0 | Data < lower bound, -1 |
12 | Simple | D | 2015-01-01 12:45:00 | 2015-01-01 12:45:00 | 1.0 | Data < lower bound, -1 |
13 | Simple | D | 2015-01-01 03:15:00 | 2015-01-01 03:30:00 | 2.0 | Data > upper bound, 1 |
14 | Simple | D | 2015-01-01 04:00:00 | 2015-01-01 04:00:00 | 1.0 | Data > upper bound, 1 |
15 | Simple | D | 2015-01-01 04:30:00 | 2015-01-01 04:45:00 | 2.0 | Data > upper bound, 1 |
16 | Simple | D | 2015-01-01 18:30:00 | 2015-01-01 18:45:00 | 2.0 | Data > upper bound, 1 |
17 | Simple | D | 2015-01-01 19:15:00 | 2015-01-01 19:45:00 | 3.0 | Data > upper bound, 1 |
Notes:
NoneThis report was generated by Pecos 0.1.1, 05/06/2016