Metrics API

Evaluation metrics for change point detection.

Overview

Evaluation metrics for change point detection.

This module provides comprehensive metrics for evaluating change point detection performance, including:

  • Point-based metrics (precision, recall, F-beta)

  • Distance-based metrics (Hausdorff, annotation error)

  • Segmentation-based metrics (Adjusted Rand Index, Hamming)

  • Advanced metrics (covering metric for multiple annotators)

Features: - Detailed return values (dicts with breakdowns) - Support for multiple ground truth annotators - Statistical significance testing capabilities - Enhanced versions of standard metrics

exception fastcpd.metrics.ChangePointMetricsError[source]

Exception raised for errors in metrics computation.

fastcpd.metrics.precision_recall(true_cps: List | ndarray, pred_cps: List | ndarray, margin: int = 10, n_samples: int | None = None) Dict[source]

Calculate precision, recall, and F1 score with tolerance margin.

A predicted change point is considered correct (true positive) if it falls within margin samples of a true change point. Each true CP can only be matched once to avoid multiple detections counting as multiple TPs.

Parameters:
  • true_cps – True change points (list or array)

  • pred_cps – Predicted change points (list or array)

  • margin – Tolerance window in samples (default: 10)

  • n_samples – Total number of samples (optional, for validation)

Returns:

  • precision: TP / (TP + FP)

  • recall: TP / (TP + FN)

  • f1_score: 2 * (P * R) / (P + R)

  • true_positives: Number of correctly detected CPs

  • false_positives: Number of incorrect detections

  • false_negatives: Number of missed true CPs

  • matched_pairs: List of (true_cp, pred_cp) tuples

  • unmatched_true: True CPs with no match

  • unmatched_pred: Predicted CPs with no match

Return type:

Dictionary with

Examples

>>> true_cps = [100, 200, 300]
>>> pred_cps = [98, 205, 299, 350]
>>> result = precision_recall(true_cps, pred_cps, margin=10)
>>> print(f"Precision: {result['precision']:.2f}")
Precision: 0.75
>>> print(f"Recall: {result['recall']:.2f}")
Recall: 1.00
fastcpd.metrics.f_beta_score(true_cps: List | ndarray, pred_cps: List | ndarray, beta: float = 1.0, margin: int = 10, n_samples: int | None = None) Dict[source]

Calculate F-beta score with adjustable precision/recall weighting.

The F-beta score is a weighted harmonic mean of precision and recall: F_beta = (1 + beta^2) * (precision * recall) / (beta^2 * precision + recall)

  • beta = 1: F1 score (equal weight)

  • beta < 1: Favor precision (penalize false positives more)

  • beta > 1: Favor recall (penalize false negatives more)

Parameters:
  • true_cps – True change points

  • pred_cps – Predicted change points

  • beta – Weight of recall vs precision (default: 1.0)

  • margin – Tolerance window in samples (default: 10)

  • n_samples – Total number of samples (optional)

Returns:

  • f_beta: F-beta score

  • f1_score: F1 score (beta=1)

  • f2_score: F2 score (beta=2, recall-focused)

  • f0_5_score: F0.5 score (beta=0.5, precision-focused)

  • precision: Precision

  • recall: Recall

  • (plus all fields from precision_recall)

Return type:

Dictionary with

Examples

>>> # When missing CPs is worse than false alarms, use beta=2
>>> result = f_beta_score(true_cps, pred_cps, beta=2.0)
>>> # When false alarms are worse, use beta=0.5
>>> result = f_beta_score(true_cps, pred_cps, beta=0.5)
fastcpd.metrics.hausdorff_distance(cps1: List | ndarray, cps2: List | ndarray, directed: bool = False, n_samples: int | None = None) Dict[source]

Calculate Hausdorff distance between two change point sets.

The Hausdorff distance measures the maximum distance from any point in one set to the closest point in the other set. It’s sensitive to outliers and provides worst-case analysis.

Symmetric: H(A, B) = max(h(A, B), h(B, A)) Directed: h(A, B) = max_{a in A} min_{b in B} |a - b|

Parameters:
  • cps1 – Change point sets to compare

  • cps2 – Change point sets to compare

  • directed – If True, compute directed distance h(cps1, cps2) only

  • n_samples – Total number of samples (optional)

Returns:

  • hausdorff: Hausdorff distance (symmetric if directed=False)

  • forward_distance: max distance from cps1 to cps2

  • backward_distance: max distance from cps2 to cps1

  • closest_pairs: List of (cp1, cp2, distance) tuples

Return type:

Dictionary with

Examples

>>> cps1 = [100, 200, 300]
>>> cps2 = [105, 200, 400]
>>> result = hausdorff_distance(cps1, cps2)
>>> print(f"Hausdorff: {result['hausdorff']}")
Hausdorff: 100
fastcpd.metrics.annotation_error(true_cps: List | ndarray, pred_cps: List | ndarray, method: str = 'mae', n_samples: int | None = None) Dict[source]

Calculate annotation error between change points.

Measures how accurately change points are localized by computing the error between matched pairs. Uses optimal matching to pair true and predicted CPs.

Parameters:
  • true_cps – True change points

  • pred_cps – Predicted change points

  • method – Error metric - ‘mae’, ‘mse’, ‘rmse’, or ‘median_ae’

  • n_samples – Total number of samples (optional)

Returns:

  • error: Overall error (according to method)

  • errors_per_cp: List of errors for each matched pair

  • median_error: Median error

  • max_error: Maximum error

  • min_error: Minimum error

  • mean_error: Mean absolute error

  • std_error: Standard deviation of errors

  • matched_pairs: List of (true_cp, pred_cp) tuples

Return type:

Dictionary with

Examples

>>> true_cps = [100, 200, 300]
>>> pred_cps = [98, 205, 295]
>>> result = annotation_error(true_cps, pred_cps, method='mae')
>>> print(f"MAE: {result['error']:.1f}")
MAE: 3.7
fastcpd.metrics.adjusted_rand_index(true_cps: List | ndarray, pred_cps: List | ndarray, n_samples: int) Dict[source]

Calculate Adjusted Rand Index for segmentation agreement.

The ARI measures similarity between two segmentations, correcting for chance agreement. Values range from -1 to 1: - ARI = 1: Perfect agreement - ARI = 0: Agreement by chance - ARI < 0: Worse than random

Uses efficient O(n) implementation from Prates (2021).

Parameters:
  • true_cps – True change points

  • pred_cps – Predicted change points

  • n_samples – Total number of samples (required)

Returns:

  • ari: Adjusted Rand Index

  • rand_index: Unadjusted Rand Index

  • agreement_rate: Proportion of agreeing pairs

  • disagreement_rate: Proportion of disagreeing pairs

Return type:

Dictionary with

Examples

>>> true_cps = [100, 200]
>>> pred_cps = [100, 200]
>>> result = adjusted_rand_index(true_cps, pred_cps, n_samples=300)
>>> print(f"ARI: {result['ari']:.2f}")
ARI: 1.00
fastcpd.metrics.covering_metric(true_cps_list: List[List | ndarray], pred_cps: List | ndarray, margin: int = 10, n_samples: int | None = None) Dict[source]

Calculate covering metric for multiple annotators.

The covering metric measures how well predictions agree with EACH individual annotator, rather than just the combined annotations. Higher scores indicate that the algorithm explains all annotators.

Based on van den Burg & Williams (2020).

Parameters:
  • true_cps_list – List of lists, each sublist is one annotator’s CPs

  • pred_cps – Predicted change points

  • margin – Tolerance window

  • n_samples – Total number of samples (optional)

Returns:

  • covering_score: Mean recall across all annotators

  • recall_per_annotator: List of recall for each annotator

  • mean_recall: Same as covering_score

  • std_recall: Standard deviation of recalls

  • min_recall: Minimum recall across annotators

  • max_recall: Maximum recall across annotators

  • n_annotators: Number of annotators

Return type:

Dictionary with

Examples

>>> # 3 annotators with slightly different annotations
>>> true_cps_list = [[100, 200], [98, 202], [102, 198]]
>>> pred_cps = [100, 200]
>>> result = covering_metric(true_cps_list, pred_cps, margin=5)
>>> print(f"Covering: {result['covering_score']:.2f}")
Covering: 1.00
fastcpd.metrics.evaluate_all(true_cps: List | ndarray | List[List], pred_cps: List | ndarray, n_samples: int, margin: int = 10) Dict[source]

Compute all available metrics for comprehensive evaluation.

Automatically detects if true_cps contains multiple annotators (list of lists) and computes appropriate metrics.

Parameters:
  • true_cps – True change points (list/array or list of lists for multiple annotators)

  • pred_cps – Predicted change points

  • n_samples – Total number of samples

  • margin – Tolerance for point-based metrics

Returns:

  • point_metrics: precision, recall, f1, etc.

  • distance_metrics: hausdorff, annotation_error

  • segmentation_metrics: ari

  • covering_metrics: (if multiple annotators)

  • summary: Formatted text summary

Return type:

Dictionary with

Examples

>>> result = evaluate_all([100, 200], [98, 202], n_samples=300, margin=5)
>>> print(result['summary'])
>>> # For multiple annotators:
>>> result = evaluate_all([[100, 200], [98, 202]], [100, 200],
...                       n_samples=300, margin=5)

Individual Metrics

Precision and Recall

fastcpd.metrics.precision_recall(true_cps: List | ndarray, pred_cps: List | ndarray, margin: int = 10, n_samples: int | None = None) Dict[source]

Calculate precision, recall, and F1 score with tolerance margin.

A predicted change point is considered correct (true positive) if it falls within margin samples of a true change point. Each true CP can only be matched once to avoid multiple detections counting as multiple TPs.

Parameters:
  • true_cps – True change points (list or array)

  • pred_cps – Predicted change points (list or array)

  • margin – Tolerance window in samples (default: 10)

  • n_samples – Total number of samples (optional, for validation)

Returns:

  • precision: TP / (TP + FP)

  • recall: TP / (TP + FN)

  • f1_score: 2 * (P * R) / (P + R)

  • true_positives: Number of correctly detected CPs

  • false_positives: Number of incorrect detections

  • false_negatives: Number of missed true CPs

  • matched_pairs: List of (true_cp, pred_cp) tuples

  • unmatched_true: True CPs with no match

  • unmatched_pred: Predicted CPs with no match

Return type:

Dictionary with

Examples

>>> true_cps = [100, 200, 300]
>>> pred_cps = [98, 205, 299, 350]
>>> result = precision_recall(true_cps, pred_cps, margin=10)
>>> print(f"Precision: {result['precision']:.2f}")
Precision: 0.75
>>> print(f"Recall: {result['recall']:.2f}")
Recall: 1.00
fastcpd.metrics.f_beta_score(true_cps: List | ndarray, pred_cps: List | ndarray, beta: float = 1.0, margin: int = 10, n_samples: int | None = None) Dict[source]

Calculate F-beta score with adjustable precision/recall weighting.

The F-beta score is a weighted harmonic mean of precision and recall: F_beta = (1 + beta^2) * (precision * recall) / (beta^2 * precision + recall)

  • beta = 1: F1 score (equal weight)

  • beta < 1: Favor precision (penalize false positives more)

  • beta > 1: Favor recall (penalize false negatives more)

Parameters:
  • true_cps – True change points

  • pred_cps – Predicted change points

  • beta – Weight of recall vs precision (default: 1.0)

  • margin – Tolerance window in samples (default: 10)

  • n_samples – Total number of samples (optional)

Returns:

  • f_beta: F-beta score

  • f1_score: F1 score (beta=1)

  • f2_score: F2 score (beta=2, recall-focused)

  • f0_5_score: F0.5 score (beta=0.5, precision-focused)

  • precision: Precision

  • recall: Recall

  • (plus all fields from precision_recall)

Return type:

Dictionary with

Examples

>>> # When missing CPs is worse than false alarms, use beta=2
>>> result = f_beta_score(true_cps, pred_cps, beta=2.0)
>>> # When false alarms are worse, use beta=0.5
>>> result = f_beta_score(true_cps, pred_cps, beta=0.5)

Distance Metrics

fastcpd.metrics.hausdorff_distance(cps1: List | ndarray, cps2: List | ndarray, directed: bool = False, n_samples: int | None = None) Dict[source]

Calculate Hausdorff distance between two change point sets.

The Hausdorff distance measures the maximum distance from any point in one set to the closest point in the other set. It’s sensitive to outliers and provides worst-case analysis.

Symmetric: H(A, B) = max(h(A, B), h(B, A)) Directed: h(A, B) = max_{a in A} min_{b in B} |a - b|

Parameters:
  • cps1 – Change point sets to compare

  • cps2 – Change point sets to compare

  • directed – If True, compute directed distance h(cps1, cps2) only

  • n_samples – Total number of samples (optional)

Returns:

  • hausdorff: Hausdorff distance (symmetric if directed=False)

  • forward_distance: max distance from cps1 to cps2

  • backward_distance: max distance from cps2 to cps1

  • closest_pairs: List of (cp1, cp2, distance) tuples

Return type:

Dictionary with

Examples

>>> cps1 = [100, 200, 300]
>>> cps2 = [105, 200, 400]
>>> result = hausdorff_distance(cps1, cps2)
>>> print(f"Hausdorff: {result['hausdorff']}")
Hausdorff: 100
fastcpd.metrics.annotation_error(true_cps: List | ndarray, pred_cps: List | ndarray, method: str = 'mae', n_samples: int | None = None) Dict[source]

Calculate annotation error between change points.

Measures how accurately change points are localized by computing the error between matched pairs. Uses optimal matching to pair true and predicted CPs.

Parameters:
  • true_cps – True change points

  • pred_cps – Predicted change points

  • method – Error metric - ‘mae’, ‘mse’, ‘rmse’, or ‘median_ae’

  • n_samples – Total number of samples (optional)

Returns:

  • error: Overall error (according to method)

  • errors_per_cp: List of errors for each matched pair

  • median_error: Median error

  • max_error: Maximum error

  • min_error: Minimum error

  • mean_error: Mean absolute error

  • std_error: Standard deviation of errors

  • matched_pairs: List of (true_cp, pred_cp) tuples

Return type:

Dictionary with

Examples

>>> true_cps = [100, 200, 300]
>>> pred_cps = [98, 205, 295]
>>> result = annotation_error(true_cps, pred_cps, method='mae')
>>> print(f"MAE: {result['error']:.1f}")
MAE: 3.7

Agreement Metrics

fastcpd.metrics.covering_metric(true_cps_list: List[List | ndarray], pred_cps: List | ndarray, margin: int = 10, n_samples: int | None = None) Dict[source]

Calculate covering metric for multiple annotators.

The covering metric measures how well predictions agree with EACH individual annotator, rather than just the combined annotations. Higher scores indicate that the algorithm explains all annotators.

Based on van den Burg & Williams (2020).

Parameters:
  • true_cps_list – List of lists, each sublist is one annotator’s CPs

  • pred_cps – Predicted change points

  • margin – Tolerance window

  • n_samples – Total number of samples (optional)

Returns:

  • covering_score: Mean recall across all annotators

  • recall_per_annotator: List of recall for each annotator

  • mean_recall: Same as covering_score

  • std_recall: Standard deviation of recalls

  • min_recall: Minimum recall across annotators

  • max_recall: Maximum recall across annotators

  • n_annotators: Number of annotators

Return type:

Dictionary with

Examples

>>> # 3 annotators with slightly different annotations
>>> true_cps_list = [[100, 200], [98, 202], [102, 198]]
>>> pred_cps = [100, 200]
>>> result = covering_metric(true_cps_list, pred_cps, margin=5)
>>> print(f"Covering: {result['covering_score']:.2f}")
Covering: 1.00

Segmentation Metrics

fastcpd.metrics.adjusted_rand_index(true_cps: List | ndarray, pred_cps: List | ndarray, n_samples: int) Dict[source]

Calculate Adjusted Rand Index for segmentation agreement.

The ARI measures similarity between two segmentations, correcting for chance agreement. Values range from -1 to 1: - ARI = 1: Perfect agreement - ARI = 0: Agreement by chance - ARI < 0: Worse than random

Uses efficient O(n) implementation from Prates (2021).

Parameters:
  • true_cps – True change points

  • pred_cps – Predicted change points

  • n_samples – Total number of samples (required)

Returns:

  • ari: Adjusted Rand Index

  • rand_index: Unadjusted Rand Index

  • agreement_rate: Proportion of agreeing pairs

  • disagreement_rate: Proportion of disagreeing pairs

Return type:

Dictionary with

Examples

>>> true_cps = [100, 200]
>>> pred_cps = [100, 200]
>>> result = adjusted_rand_index(true_cps, pred_cps, n_samples=300)
>>> print(f"ARI: {result['ari']:.2f}")
ARI: 1.00

Combined Evaluation

fastcpd.metrics.evaluate_all(true_cps: List | ndarray | List[List], pred_cps: List | ndarray, n_samples: int, margin: int = 10) Dict[source]

Compute all available metrics for comprehensive evaluation.

Automatically detects if true_cps contains multiple annotators (list of lists) and computes appropriate metrics.

Parameters:
  • true_cps – True change points (list/array or list of lists for multiple annotators)

  • pred_cps – Predicted change points

  • n_samples – Total number of samples

  • margin – Tolerance for point-based metrics

Returns:

  • point_metrics: precision, recall, f1, etc.

  • distance_metrics: hausdorff, annotation_error

  • segmentation_metrics: ari

  • covering_metrics: (if multiple annotators)

  • summary: Formatted text summary

Return type:

Dictionary with

Examples

>>> result = evaluate_all([100, 200], [98, 202], n_samples=300, margin=5)
>>> print(result['summary'])
>>> # For multiple annotators:
>>> result = evaluate_all([[100, 200], [98, 202]], [100, 200],
...                       n_samples=300, margin=5)

Return Value Structure

Most metrics return a dictionary with the following structure:

{
    'metric_value': float,        # Main metric value
    'true_positives': int,        # Number of true positives
    'false_positives': int,       # Number of false positives
    'false_negatives': int,       # Number of false negatives
    'n_true': int,                # Number of true change points
    'n_detected': int,            # Number of detected change points
    'margin': int,                # Tolerance margin used
    # ... additional fields ...
}

Example Usage

Basic Evaluation

from fastcpd.metrics import precision_recall

true_cps = [100, 200, 300]
detected_cps = [98, 205, 350]

# Get precision, recall, and F1
pr = precision_recall(true_cps, detected_cps, n_samples=500, margin=10)

print(f"Precision: {pr['precision']:.3f}")
print(f"Recall:    {pr['recall']:.3f}")
print(f"F1-Score:  {pr['f1_score']:.3f}")

Comprehensive Evaluation

from fastcpd.metrics import evaluate_all

# Evaluate all metrics at once
metrics = evaluate_all(
    true_cps=true_cps,
    pred_cps=detected_cps,
    n_samples=500,
    margin=10
)

# Access results
print(f"Precision: {metrics['point_metrics']['precision']:.3f}")
print(f"Recall: {metrics['point_metrics']['recall']:.3f}")
print(f"F1-Score: {metrics['point_metrics']['f1_score']:.3f}")
print(f"Hausdorff: {metrics['distance_metrics']['hausdorff']:.1f}")

Multi-Annotator Evaluation

from fastcpd.metrics import covering_metric

# Multiple expert annotations
annotations = [
    [100, 200, 300],  # Expert 1
    [105, 195, 305],  # Expert 2
    [98, 203, 298]    # Expert 3
]

detected_cps = [102, 201, 299]

result = covering_metric(annotations, detected_cps, margin=10)
print(f"Covering: {result['covering_score']:.3f}")