Metrics API¶

Evaluation metrics for change point detection.

Overview¶

Evaluation metrics for change point detection.

This module provides comprehensive metrics for evaluating change point detection performance, including:

Point-based metrics (precision, recall, F-beta)
Distance-based metrics (Hausdorff, annotation error)
Segmentation-based metrics (Adjusted Rand Index, Hamming)
Advanced metrics (covering metric for multiple annotators)

Features: - Detailed return values (dicts with breakdowns) - Support for multiple ground truth annotators - Statistical significance testing capabilities - Enhanced versions of standard metrics

exception fastcpd.metrics.ChangePointMetricsError[source]¶: Exception raised for errors in metrics computation.

fastcpd.metrics.precision_recall(true_cps: List | ndarray, pred_cps: List | ndarray, margin: int = 10, n_samples: int | None = None) → Dict[source]¶

Calculate precision, recall, and F1 score with tolerance margin.

A predicted change point is considered correct (true positive) if it falls within margin samples of a true change point. Each true CP can only be matched once to avoid multiple detections counting as multiple TPs.

Parameters:

true_cps – True change points (list or array)
pred_cps – Predicted change points (list or array)
margin – Tolerance window in samples (default: 10)
n_samples – Total number of samples (optional, for validation)

Returns:

precision: TP / (TP + FP)
recall: TP / (TP + FN)
f1_score: 2 * (P * R) / (P + R)
true_positives: Number of correctly detected CPs
false_positives: Number of incorrect detections
false_negatives: Number of missed true CPs
matched_pairs: List of (true_cp, pred_cp) tuples
unmatched_true: True CPs with no match
unmatched_pred: Predicted CPs with no match

Return type:

Dictionary with

Examples

>>> true_cps = [100, 200, 300]
>>> pred_cps = [98, 205, 299, 350]
>>> result = precision_recall(true_cps, pred_cps, margin=10)
>>> print(f"Precision: {result['precision']:.2f}")
Precision: 0.75
>>> print(f"Recall: {result['recall']:.2f}")
Recall: 1.00

fastcpd.metrics.f_beta_score(true_cps: List | ndarray, pred_cps: List | ndarray, beta: float = 1.0, margin: int = 10, n_samples: int | None = None) → Dict[source]¶

Calculate F-beta score with adjustable precision/recall weighting.

The F-beta score is a weighted harmonic mean of precision and recall: F_beta = (1 + beta^2) * (precision * recall) / (beta^2 * precision + recall)

beta = 1: F1 score (equal weight)
beta < 1: Favor precision (penalize false positives more)
beta > 1: Favor recall (penalize false negatives more)

Parameters:

true_cps – True change points
pred_cps – Predicted change points
beta – Weight of recall vs precision (default: 1.0)
margin – Tolerance window in samples (default: 10)
n_samples – Total number of samples (optional)

Returns:

f_beta: F-beta score
f1_score: F1 score (beta=1)
f2_score: F2 score (beta=2, recall-focused)
f0_5_score: F0.5 score (beta=0.5, precision-focused)
precision: Precision
recall: Recall
(plus all fields from precision_recall)

Return type:

Dictionary with

Examples

>>> # When missing CPs is worse than false alarms, use beta=2
>>> result = f_beta_score(true_cps, pred_cps, beta=2.0)
>>> # When false alarms are worse, use beta=0.5
>>> result = f_beta_score(true_cps, pred_cps, beta=0.5)

fastcpd.metrics.hausdorff_distance(cps1: List | ndarray, cps2: List | ndarray, directed: bool = False, n_samples: int | None = None) → Dict[source]¶

Calculate Hausdorff distance between two change point sets.

The Hausdorff distance measures the maximum distance from any point in one set to the closest point in the other set. It’s sensitive to outliers and provides worst-case analysis.

Symmetric: H(A, B) = max(h(A, B), h(B, A)) Directed: h(A, B) = max_{a in A} min_{b in B} |a - b|

Parameters:

cps1 – Change point sets to compare
cps2 – Change point sets to compare
directed – If True, compute directed distance h(cps1, cps2) only
n_samples – Total number of samples (optional)

Returns:

hausdorff: Hausdorff distance (symmetric if directed=False)
forward_distance: max distance from cps1 to cps2
backward_distance: max distance from cps2 to cps1
closest_pairs: List of (cp1, cp2, distance) tuples

Return type:

Dictionary with

Examples

>>> cps1 = [100, 200, 300]
>>> cps2 = [105, 200, 400]
>>> result = hausdorff_distance(cps1, cps2)
>>> print(f"Hausdorff: {result['hausdorff']}")
Hausdorff: 100

fastcpd.metrics.annotation_error(true_cps: List | ndarray, pred_cps: List | ndarray, method: str = 'mae', n_samples: int | None = None) → Dict[source]¶

Calculate annotation error between change points.

Measures how accurately change points are localized by computing the error between matched pairs. Uses optimal matching to pair true and predicted CPs.

Parameters:

true_cps – True change points
pred_cps – Predicted change points
method – Error metric - ‘mae’, ‘mse’, ‘rmse’, or ‘median_ae’
n_samples – Total number of samples (optional)

Returns:

error: Overall error (according to method)
errors_per_cp: List of errors for each matched pair
median_error: Median error
max_error: Maximum error
min_error: Minimum error
mean_error: Mean absolute error
std_error: Standard deviation of errors
matched_pairs: List of (true_cp, pred_cp) tuples

Return type:

Dictionary with

Examples

>>> true_cps = [100, 200, 300]
>>> pred_cps = [98, 205, 295]
>>> result = annotation_error(true_cps, pred_cps, method='mae')
>>> print(f"MAE: {result['error']:.1f}")
MAE: 3.7

fastcpd.metrics.adjusted_rand_index(true_cps: List | ndarray, pred_cps: List | ndarray, n_samples: int) → Dict[source]¶

Calculate Adjusted Rand Index for segmentation agreement.

The ARI measures similarity between two segmentations, correcting for chance agreement. Values range from -1 to 1: - ARI = 1: Perfect agreement - ARI = 0: Agreement by chance - ARI < 0: Worse than random

Uses efficient O(n) implementation from Prates (2021).

Parameters:

true_cps – True change points
pred_cps – Predicted change points
n_samples – Total number of samples (required)

Returns:

ari: Adjusted Rand Index
rand_index: Unadjusted Rand Index
agreement_rate: Proportion of agreeing pairs
disagreement_rate: Proportion of disagreeing pairs

Return type:

Dictionary with

Examples

>>> true_cps = [100, 200]
>>> pred_cps = [100, 200]
>>> result = adjusted_rand_index(true_cps, pred_cps, n_samples=300)
>>> print(f"ARI: {result['ari']:.2f}")
ARI: 1.00

fastcpd.metrics.covering_metric(true_cps_list: List[List | ndarray], pred_cps: List | ndarray, margin: int = 10, n_samples: int | None = None) → Dict[source]¶

Calculate covering metric for multiple annotators.

The covering metric measures how well predictions agree with EACH individual annotator, rather than just the combined annotations. Higher scores indicate that the algorithm explains all annotators.

Based on van den Burg & Williams (2020).

Parameters:

true_cps_list – List of lists, each sublist is one annotator’s CPs
pred_cps – Predicted change points
margin – Tolerance window
n_samples – Total number of samples (optional)

Returns:

covering_score: Mean recall across all annotators
recall_per_annotator: List of recall for each annotator
mean_recall: Same as covering_score
std_recall: Standard deviation of recalls
min_recall: Minimum recall across annotators
max_recall: Maximum recall across annotators
n_annotators: Number of annotators

Return type:

Dictionary with

Examples

>>> # 3 annotators with slightly different annotations
>>> true_cps_list = [[100, 200], [98, 202], [102, 198]]
>>> pred_cps = [100, 200]
>>> result = covering_metric(true_cps_list, pred_cps, margin=5)
>>> print(f"Covering: {result['covering_score']:.2f}")
Covering: 1.00

fastcpd.metrics.evaluate_all(true_cps: List | ndarray | List[List], pred_cps: List | ndarray, n_samples: int, margin: int = 10) → Dict[source]¶

Compute all available metrics for comprehensive evaluation.

Automatically detects if true_cps contains multiple annotators (list of lists) and computes appropriate metrics.

Parameters:

true_cps – True change points (list/array or list of lists for multiple annotators)
pred_cps – Predicted change points
n_samples – Total number of samples
margin – Tolerance for point-based metrics

Returns:

point_metrics: precision, recall, f1, etc.
distance_metrics: hausdorff, annotation_error
segmentation_metrics: ari
covering_metrics: (if multiple annotators)
summary: Formatted text summary

Return type:

Dictionary with

Examples

>>> result = evaluate_all([100, 200], [98, 202], n_samples=300, margin=5)
>>> print(result['summary'])
>>> # For multiple annotators:
>>> result = evaluate_all([[100, 200], [98, 202]], [100, 200],
...                       n_samples=300, margin=5)

Individual Metrics¶

Precision and Recall¶

fastcpd.metrics.precision_recall(true_cps: List | ndarray, pred_cps: List | ndarray, margin: int = 10, n_samples: int | None = None) → Dict[source]¶

Calculate precision, recall, and F1 score with tolerance margin.

Parameters:

true_cps – True change points (list or array)
pred_cps – Predicted change points (list or array)
margin – Tolerance window in samples (default: 10)
n_samples – Total number of samples (optional, for validation)

Returns:

precision: TP / (TP + FP)
recall: TP / (TP + FN)
f1_score: 2 * (P * R) / (P + R)
true_positives: Number of correctly detected CPs
false_positives: Number of incorrect detections
false_negatives: Number of missed true CPs
matched_pairs: List of (true_cp, pred_cp) tuples
unmatched_true: True CPs with no match
unmatched_pred: Predicted CPs with no match

Return type:

Dictionary with

Examples

>>> true_cps = [100, 200, 300]
>>> pred_cps = [98, 205, 299, 350]
>>> result = precision_recall(true_cps, pred_cps, margin=10)
>>> print(f"Precision: {result['precision']:.2f}")
Precision: 0.75
>>> print(f"Recall: {result['recall']:.2f}")
Recall: 1.00

fastcpd.metrics.f_beta_score(true_cps: List | ndarray, pred_cps: List | ndarray, beta: float = 1.0, margin: int = 10, n_samples: int | None = None) → Dict[source]¶

Calculate F-beta score with adjustable precision/recall weighting.

The F-beta score is a weighted harmonic mean of precision and recall: F_beta = (1 + beta^2) * (precision * recall) / (beta^2 * precision + recall)

beta = 1: F1 score (equal weight)
beta < 1: Favor precision (penalize false positives more)
beta > 1: Favor recall (penalize false negatives more)

Parameters:

true_cps – True change points
pred_cps – Predicted change points
beta – Weight of recall vs precision (default: 1.0)
margin – Tolerance window in samples (default: 10)
n_samples – Total number of samples (optional)

Returns:

f_beta: F-beta score
f1_score: F1 score (beta=1)
f2_score: F2 score (beta=2, recall-focused)
f0_5_score: F0.5 score (beta=0.5, precision-focused)
precision: Precision
recall: Recall
(plus all fields from precision_recall)

Return type:

Dictionary with

Examples

>>> # When missing CPs is worse than false alarms, use beta=2
>>> result = f_beta_score(true_cps, pred_cps, beta=2.0)
>>> # When false alarms are worse, use beta=0.5
>>> result = f_beta_score(true_cps, pred_cps, beta=0.5)

Distance Metrics¶

fastcpd.metrics.hausdorff_distance(cps1: List | ndarray, cps2: List | ndarray, directed: bool = False, n_samples: int | None = None) → Dict[source]¶

Calculate Hausdorff distance between two change point sets.

The Hausdorff distance measures the maximum distance from any point in one set to the closest point in the other set. It’s sensitive to outliers and provides worst-case analysis.

Symmetric: H(A, B) = max(h(A, B), h(B, A)) Directed: h(A, B) = max_{a in A} min_{b in B} |a - b|

Parameters:

cps1 – Change point sets to compare
cps2 – Change point sets to compare
directed – If True, compute directed distance h(cps1, cps2) only
n_samples – Total number of samples (optional)

Returns:

hausdorff: Hausdorff distance (symmetric if directed=False)
forward_distance: max distance from cps1 to cps2
backward_distance: max distance from cps2 to cps1
closest_pairs: List of (cp1, cp2, distance) tuples

Return type:

Dictionary with

Examples

>>> cps1 = [100, 200, 300]
>>> cps2 = [105, 200, 400]
>>> result = hausdorff_distance(cps1, cps2)
>>> print(f"Hausdorff: {result['hausdorff']}")
Hausdorff: 100

fastcpd.metrics.annotation_error(true_cps: List | ndarray, pred_cps: List | ndarray, method: str = 'mae', n_samples: int | None = None) → Dict[source]¶

Calculate annotation error between change points.

Measures how accurately change points are localized by computing the error between matched pairs. Uses optimal matching to pair true and predicted CPs.

Parameters:

true_cps – True change points
pred_cps – Predicted change points
method – Error metric - ‘mae’, ‘mse’, ‘rmse’, or ‘median_ae’
n_samples – Total number of samples (optional)

Returns:

error: Overall error (according to method)
errors_per_cp: List of errors for each matched pair
median_error: Median error
max_error: Maximum error
min_error: Minimum error
mean_error: Mean absolute error
std_error: Standard deviation of errors
matched_pairs: List of (true_cp, pred_cp) tuples

Return type:

Dictionary with

Examples

>>> true_cps = [100, 200, 300]
>>> pred_cps = [98, 205, 295]
>>> result = annotation_error(true_cps, pred_cps, method='mae')
>>> print(f"MAE: {result['error']:.1f}")
MAE: 3.7

Agreement Metrics¶

fastcpd.metrics.covering_metric(true_cps_list: List[List | ndarray], pred_cps: List | ndarray, margin: int = 10, n_samples: int | None = None) → Dict[source]¶

Calculate covering metric for multiple annotators.

The covering metric measures how well predictions agree with EACH individual annotator, rather than just the combined annotations. Higher scores indicate that the algorithm explains all annotators.

Based on van den Burg & Williams (2020).

Parameters:

true_cps_list – List of lists, each sublist is one annotator’s CPs
pred_cps – Predicted change points
margin – Tolerance window
n_samples – Total number of samples (optional)

Returns:

covering_score: Mean recall across all annotators
recall_per_annotator: List of recall for each annotator
mean_recall: Same as covering_score
std_recall: Standard deviation of recalls
min_recall: Minimum recall across annotators
max_recall: Maximum recall across annotators
n_annotators: Number of annotators

Return type:

Dictionary with

Examples

>>> # 3 annotators with slightly different annotations
>>> true_cps_list = [[100, 200], [98, 202], [102, 198]]
>>> pred_cps = [100, 200]
>>> result = covering_metric(true_cps_list, pred_cps, margin=5)
>>> print(f"Covering: {result['covering_score']:.2f}")
Covering: 1.00

Segmentation Metrics¶

fastcpd.metrics.adjusted_rand_index(true_cps: List | ndarray, pred_cps: List | ndarray, n_samples: int) → Dict[source]¶

Calculate Adjusted Rand Index for segmentation agreement.

Uses efficient O(n) implementation from Prates (2021).

Parameters:

true_cps – True change points
pred_cps – Predicted change points
n_samples – Total number of samples (required)

Returns:

ari: Adjusted Rand Index
rand_index: Unadjusted Rand Index
agreement_rate: Proportion of agreeing pairs
disagreement_rate: Proportion of disagreeing pairs

Return type:

Dictionary with

Examples

>>> true_cps = [100, 200]
>>> pred_cps = [100, 200]
>>> result = adjusted_rand_index(true_cps, pred_cps, n_samples=300)
>>> print(f"ARI: {result['ari']:.2f}")
ARI: 1.00

Combined Evaluation¶

fastcpd.metrics.evaluate_all(true_cps: List | ndarray | List[List], pred_cps: List | ndarray, n_samples: int, margin: int = 10) → Dict[source]¶

Compute all available metrics for comprehensive evaluation.

Automatically detects if true_cps contains multiple annotators (list of lists) and computes appropriate metrics.

Parameters:

true_cps – True change points (list/array or list of lists for multiple annotators)
pred_cps – Predicted change points
n_samples – Total number of samples
margin – Tolerance for point-based metrics

Returns:

point_metrics: precision, recall, f1, etc.
distance_metrics: hausdorff, annotation_error
segmentation_metrics: ari
covering_metrics: (if multiple annotators)
summary: Formatted text summary

Return type:

Dictionary with

Examples

>>> result = evaluate_all([100, 200], [98, 202], n_samples=300, margin=5)
>>> print(result['summary'])
>>> # For multiple annotators:
>>> result = evaluate_all([[100, 200], [98, 202]], [100, 200],
...                       n_samples=300, margin=5)

Return Value Structure¶

Most metrics return a dictionary with the following structure:

{
    'metric_value': float,        # Main metric value
    'true_positives': int,        # Number of true positives
    'false_positives': int,       # Number of false positives
    'false_negatives': int,       # Number of false negatives
    'n_true': int,                # Number of true change points
    'n_detected': int,            # Number of detected change points
    'margin': int,                # Tolerance margin used
    # ... additional fields ...
}

Example Usage¶

Basic Evaluation¶

from fastcpd.metrics import precision_recall

true_cps = [100, 200, 300]
detected_cps = [98, 205, 350]

# Get precision, recall, and F1
pr = precision_recall(true_cps, detected_cps, n_samples=500, margin=10)

print(f"Precision: {pr['precision']:.3f}")
print(f"Recall:    {pr['recall']:.3f}")
print(f"F1-Score:  {pr['f1_score']:.3f}")

Comprehensive Evaluation¶

from fastcpd.metrics import evaluate_all

# Evaluate all metrics at once
metrics = evaluate_all(
    true_cps=true_cps,
    pred_cps=detected_cps,
    n_samples=500,
    margin=10
)

# Access results
print(f"Precision: {metrics['point_metrics']['precision']:.3f}")
print(f"Recall: {metrics['point_metrics']['recall']:.3f}")
print(f"F1-Score: {metrics['point_metrics']['f1_score']:.3f}")
print(f"Hausdorff: {metrics['distance_metrics']['hausdorff']:.1f}")

Multi-Annotator Evaluation¶

from fastcpd.metrics import covering_metric

# Multiple expert annotations
annotations = [
    [100, 200, 300],  # Expert 1
    [105, 195, 305],  # Expert 2
    [98, 203, 298]    # Expert 3
]

detected_cps = [102, 201, 299]

result = covering_metric(annotations, detected_cps, margin=10)
print(f"Covering: {result['covering_score']:.3f}")