pecos.metrics module

The metrics module contains performance metrics to track system health.

pecos.metrics.qci(mask, tfilter=None, per_day=True)[source]

Compute the quality control index defined as:

\(QCI=\dfrac{\sum_{d\in D}\sum_{t\in T}X_{dt}}{|DT|}\)

where \(D\) is the set of data columns and \(T\) is the set of time stamps in the analysis. \(X_{dt}\) is a data point for column \(d\) time t` that passed all quality control test. \(|DT|\) is the number of data points in the analysis.

Parameters:

mask : pd.Dataframe

Test results mask, returned from pm.get_test_results_mask()

tfilter : pd.Series (optional)

Time filter containing boolean values for each time index

per_day : boolean (optional)

Flag indicating if the results should be computed per day, default = True

Returns:

QCI : pd.DataFrame

Quality control index

pecos.metrics.rmse(x1, x2, tfilter=None, per_day=True)[source]

Compute the root mean squared error defined as:

\(RMSE=\sqrt{\dfrac{\sum{(x_1-x_2)^2}}{n}}\)

where \(x_1\) is a time series, \(x_2\) is a time series, and \(n\) is a number of data points.

Parameters:

x1 : pd.DataFrame with a single column or pd.Series

Data

x2 : pd.DataFrame with a single column or pd.Series

Data

tfilter : pd.Series (optional)

Time filter containing boolean values for each time index

per_day : boolean (optional)

Flag indicating if the results should be computed per day, default = True

Returns:

RMSE : pd.DataFrame

Root mean squared error of the data

pecos.metrics.time_integral(data, tfilter=None, per_day=True)[source]

Compute the time integral of each column in the DataFrame defined as:

\(F=\int{fdt}\)

where \(f\) is a column of data \(dt\) is the time step between observations. The time integral is computed using the trapezoidal rule from numpy.trapz. Results are given in [original data units]*seconds. NaN values are set to 0 for integration.

Parameters:

data : pd.DataFrame

Data

tfilter : pd.Series (optional)

Time filter containing boolean values for each time index

per_day : boolean (doptional)

Flag indicating if the results should be computed per day, default = True

Returns:

F : pd.DataFrame

Time integral of the data, each column is named ‘Time integral of ‘ + original column name.