Skip to main content

Concept

The pursuit of alpha, the measure of a model’s capacity to generate returns exceeding a benchmark, is the central objective of quantitative finance. A model’s perceived alpha, derived from a backtest, often serves as the primary justification for deploying capital. The distortion of this measure through backtest overfitting represents a profound systemic risk, transforming a tool of discovery into a mechanism for self-deception.

Overfitting occurs when a model is calibrated so precisely to historical data that it captures not only the underlying market signal but also its random noise. The result is a model that appears exceptionally profitable in simulation but whose performance collapses when exposed to live market conditions.

This phenomenon arises from a fundamental misunderstanding of a backtest’s purpose. A backtest is not an experiment in the classical sense, repeatable under controlled conditions. Financial markets are non-stationary; the past is a single, non-repeatable path drawn from an infinite set of possibilities. An overfitted model memorizes the specific contours of that single path.

It mistakes random fluctuations for durable patterns, creating a fragile system optimized for a reality that no longer exists. The resulting alpha is an illusion, a phantom derived from data-snooping and excessive parameterization rather than genuine predictive insight.

A model’s historical performance is a single data point, not a guarantee of future success.

From a systems perspective, an overfitted model is a brittle architecture. It lacks the robustness to adapt to new information and evolving market regimes. The true measure of alpha is not found in a perfect historical fit but in a model’s resilience and its ability to generalize its logic to unseen data. The distortion occurs because the process of iterative refinement ▴ adjusting parameters, adding rules, and selecting features to maximize historical performance ▴ systematically erodes this resilience.

Each adjustment made to improve the backtest’s Sharpe ratio increases the probability of overfitting, creating a model that is perfectly adapted to a world that will never occur again. This leads to a dangerous divergence between perceived alpha and a model’s true generative capacity, a gap that often becomes apparent only after capital has been committed and losses have been realized.

Intricate metallic mechanisms portray a proprietary matching engine or execution management system. Its robust structure enables algorithmic trading and high-fidelity execution for institutional digital asset derivatives

The Anatomy of a False Positive

Understanding how backtest overfitting distorts alpha requires dissecting the concept of a false positive in quantitative research. A false positive is the incorrect discovery of a trading strategy that appears profitable in a backtest but has no actual predictive power. This is the direct output of overfitting. The process is insidious because it feels like rigorous research.

A researcher may test hundreds or thousands of strategy configurations, each a slight variation of the last, until one produces a spectacular backtest. This intense search, often called “p-hacking” or “data dredging,” makes the discovery of a spuriously high-performing strategy almost inevitable.

The distortion of alpha is magnified by the low signal-to-noise ratio inherent in financial markets. True alpha signals are often faint and transient, buried within vast amounts of random price movement. An overfitted model, by its nature, becomes hyper-receptive to this noise. It builds complex rules that connect unrelated data points, creating a narrative of cause and effect where none exists.

For instance, a model might learn that a specific combination of a moving average crossover, a particular level of the VIX, and the day of the week has historically preceded a market rally. While this pattern may have occurred by chance in the historical data, it lacks any underlying economic logic and is highly unlikely to repeat. The “alpha” it generates is purely an artifact of the dataset it was trained on.

A precise metallic instrument, resembling an algorithmic trading probe or a multi-leg spread representation, passes through a transparent RFQ protocol gateway. This illustrates high-fidelity execution within market microstructure, facilitating price discovery for digital asset derivatives

Selection Bias and the Winner’s Curse

The problem extends beyond a single researcher. The entire field of quantitative finance is susceptible to a collective form of selection bias. Researchers and publications tend to report only their successful findings, ignoring the countless failed backtests that preceded the “winning” strategy.

This creates a skewed perception of what is achievable, as the community is predominantly exposed to strategies that have, by definition, survived an intense and often un-reported selection process. This is a manifestation of the “winner’s curse,” where the very act of selecting the best-performing backtest from a large pool of trials makes it statistically likely that the chosen strategy is overfitted.

Campbell R. Harvey and Yan Liu have extensively researched this area, arguing that the Sharpe ratios of published strategies must be significantly “haircut” to account for the multiple tests that were likely performed to find them. Their work provides a statistical framework for understanding that the more a dataset is mined for signals, the higher the bar for statistical significance must be. Without this adjustment, investors are systematically misled by performance claims that are statistically inflated. The true measure of alpha is not what a model did in a curated backtest, but what it can be expected to do in the future, adjusted for the intensity of the search process that discovered it.


Strategy

Developing a strategic framework to combat the distortion of alpha from backtest overfitting requires a shift in perspective. The objective moves from finding the most profitable historical path to building a robust and adaptable system. This involves implementing protocols that systematically challenge a model’s assumptions and test its resilience against market dynamics unseen in the training data. A core component of this strategic approach is the rigorous separation of data and the disciplined application of out-of-sample testing.

The most fundamental technique is the hold-out method, where a portion of the historical data (the out-of-sample set) is completely walled off during the model development phase. All parameter tuning, feature selection, and rule generation occur exclusively on the in-sample data. The out-of-sample set serves as a final, one-time-only examination. A significant degradation in performance from the in-sample to the out-of-sample period is a clear indicator of overfitting.

This single test, however, is often insufficient. A strategy might perform well on a single out-of-sample period purely by chance. This necessitates more sophisticated validation architectures.

A model’s intelligence is not in what it knows, but in how it behaves when faced with what it does not know.
Precision cross-section of an institutional digital asset derivatives system, revealing intricate market microstructure. Toroidal halves represent interconnected liquidity pools, centrally driven by an RFQ protocol

Advanced Validation Architectures

To build a more complete picture of a model’s robustness, quantitative analysts employ several advanced validation strategies. These methods create multiple, alternative historical paths to simulate a wider range of market conditions and reduce the likelihood that a model’s success is tied to the unique characteristics of a single data partition.

  • Walk-Forward Analysis ▴ This method provides a more dynamic testing process. The historical data is divided into multiple, contiguous blocks. The model is trained on one block (e.g. years 1-5) and then tested on the subsequent block (year 6). The window then “walks forward” in time ▴ the model is retrained on years 2-6 and tested on year 7, and so on. This process simulates how a strategy would have been periodically re-calibrated and deployed in real time, providing a more realistic performance assessment.
  • Cross-Validation ▴ Borrowed from machine learning, cross-validation (CV) involves partitioning the data into ‘k’ subsets, or “folds.” The model is trained on k-1 folds and tested on the remaining fold. This process is repeated ‘k’ times, with each fold serving as the test set once. The results are then averaged to provide a more stable estimate of out-of-sample performance. Marcos Lopez de Prado advocates for a specific type, Combinatorially Purged Cross-Validation (CPCV), which is designed to prevent data leakage between training and testing sets, a common issue in financial time series where data points are not truly independent.
  • Synthetic Data Generation ▴ A powerful technique for stress-testing a model is to generate synthetic data based on the statistical properties of the historical data. This allows the creation of thousands of alternative historical paths that never actually occurred but are statistically plausible. By backtesting the model on this synthetic data, one can assess its performance across a vast range of scenarios, significantly reducing the risk of overfitting to the single, arbitrary path of actual history.
A glowing blue module with a metallic core and extending probe is set into a pristine white surface. This symbolizes an active institutional RFQ protocol, enabling precise price discovery and high-fidelity execution for digital asset derivatives

A Framework for Model Comparison

Choosing the right validation strategy is a critical decision. The following table compares these architectures across key dimensions, providing a framework for selecting the appropriate methodology based on the specific context of the model and the resources available.

Validation Method Conceptual Approach Strengths Weaknesses
Simple Hold-Out A single train/test split of the data. Simple to implement; provides a clear, final check. Results can be highly dependent on the chosen split point; risks being lucky or unlucky.
Walk-Forward Analysis Sequentially trains on past data and tests on future data. Simulates realistic, periodic model recalibration; assesses stability over time. Uses data less efficiently than cross-validation; can still be overfit to the walk-forward process itself.
K-Fold Cross-Validation Averages performance across multiple train/test splits. Provides a statistically robust estimate of out-of-sample performance; uses data efficiently. Does not preserve the temporal order of data, requiring careful implementation (e.g. purging) to avoid look-ahead bias.
Synthetic Data Tests the model on thousands of plausible, artificial histories. Vastly expands the testing universe; effectively tests the model’s logic independent of historical path dependency. Computationally intensive; performance depends on how well the synthetic data generator captures the true market dynamics.
Two reflective, disc-like structures, one tilted, one flat, symbolize the Market Microstructure of Digital Asset Derivatives. This metaphor encapsulates RFQ Protocols and High-Fidelity Execution within a Liquidity Pool for Price Discovery, vital for a Principal's Operational Framework ensuring Atomic Settlement

The Role of Economic Rationale

Beyond statistical rigor, a crucial strategic defense against overfitting is the insistence on a sound economic rationale for any trading strategy. A model that identifies a profitable pattern should be accompanied by a plausible explanation for why that pattern exists. Is it exploiting a documented behavioral bias? Is it providing liquidity to a specific market segment?

Is it capitalizing on a structural inefficiency? A strategy without a story is a black box that is far more likely to be the product of data mining. This qualitative check serves as a powerful filter. A model based on a coherent economic thesis is more likely to be robust and adaptable because its core logic is tied to a durable feature of the market, rather than a statistical ghost in the historical data.


Execution

Executing a robust backtesting protocol that minimizes the risk of overfitting and provides a true measure of alpha is a multi-stage, disciplined process. It moves beyond simply running a script on historical data and into the realm of systematic, scientific validation. This operational playbook is designed to instill a healthy skepticism towards in-sample results and to build a comprehensive, evidence-based case for a model’s future viability. The ultimate goal is to differentiate between a strategy that has genuinely captured a market anomaly and one that has merely memorized historical noise.

The foundation of this execution is a commitment to logging and transparency. Every single backtest run, regardless of its outcome, must be recorded. This includes the strategy configuration, the parameters used, the dataset, and the performance metrics. As Harvey and Liu’s work demonstrates, the number of trials is a critical variable in assessing the probability of a false discovery.

Without a complete log of the research process, it is impossible to properly discount the performance of the “winning” strategy to account for the multiple testing that produced it. This discipline transforms backtesting from an exploratory art into a rigorous scientific process.

A sleek, pointed object, merging light and dark modular components, embodies advanced market microstructure for digital asset derivatives. Its precise form represents high-fidelity execution, price discovery via RFQ protocols, emphasizing capital efficiency, institutional grade alpha generation

The Quantitative Gauntlet a Procedural Guide

A model must pass through a series of increasingly difficult tests before it can be considered for capital allocation. This procedural guide outlines a quantitative gauntlet designed to systematically identify and discard overfitted strategies.

  1. Data Hygiene and Preparation ▴ The process begins with the data itself. Ensure the historical data is clean, accounting for survivorship bias (including delisted stocks), corporate actions (splits, dividends), and trading costs (commissions, slippage). Using unadjusted data is a common source of inflated backtest results.
  2. Establish a Performance Baseline ▴ Before testing any complex strategy, run a simple benchmark (e.g. a buy-and-hold strategy for the relevant index) to establish a baseline for performance metrics like the Sharpe ratio and maximum drawdown. Any new strategy must demonstrate a significant improvement over this baseline.
  3. In-Sample Development with Cross-Validation ▴ Develop the core logic of the strategy using an in-sample dataset. Instead of optimizing for a single performance metric, use a robust technique like K-Fold Cross-Validation to tune parameters. This provides a more stable estimate of the model’s performance and reduces the risk of overfitting to a specific data partition.
  4. Out-of-Sample Verification ▴ Apply the finalized model, with its parameters locked, to the hold-out out-of-sample dataset. This is a one-time event. The performance on this unseen data is the most critical initial test. A severe drop in the Sharpe ratio or a significant increase in drawdown from the in-sample results is a major red flag.
  5. Sensitivity and Stress Testing ▴ Analyze how the model’s performance changes when its parameters are slightly altered. A robust model should not see its performance collapse if a parameter is changed by a small amount. Additionally, stress-test the model by exposing it to historical periods of high volatility or market crisis (e.g. 2008, 2020), even if they were not in the original dataset.
  6. Calculate the Probability of Backtest Overfitting (PBO) ▴ Employ advanced statistical methods, such as those proposed by Marcos Lopez de Prado, to estimate the probability that the model is overfitted given the number of trials conducted and the performance observed. This provides a quantitative measure of confidence in the backtest results.
A central hub with a teal ring represents a Principal's Operational Framework. Interconnected spherical execution nodes symbolize precise Algorithmic Execution and Liquidity Aggregation via RFQ Protocol

Deconstructing Performance Decay

The most tangible evidence of overfitting is the decay in performance metrics between the in-sample (IS) and out-of-sample (OOS) periods. The following table provides a hypothetical but realistic example of two strategies. Strategy A is a robust model with a clear economic rationale, while Strategy B is a highly parameterized, overfitted model that was discovered after thousands of trials.

Performance Metric Strategy A (Robust) – In-Sample Strategy A (Robust) – Out-of-Sample Strategy B (Overfitted) – In-Sample Strategy B (Overfitted) – Out-of-Sample
Annualized Return 18.5% 16.2% 45.8% -5.2%
Annualized Volatility 15.0% 15.5% 19.0% 25.0%
Sharpe Ratio 1.23 1.05 2.41 -0.21
Maximum Drawdown -12.8% -14.1% -8.5% -35.4%
Sortino Ratio 1.95 1.65 4.10 -0.35

The data clearly illustrates the danger. Strategy B appears to be a world-beating model based on its in-sample results, with a Sharpe ratio exceeding 2.4. An investor reviewing only this backtest would be highly tempted to allocate significant capital. However, its performance completely collapses out-of-sample, delivering negative returns with significantly higher volatility.

Its alpha was an illusion. Strategy A, while showing more modest in-sample results, demonstrates resilience. Its performance metrics degrade slightly out-of-sample, which is expected, but it remains profitable and its risk profile is stable. This is the signature of a model with genuine, if less spectacular, alpha.

A transparent, multi-faceted component, indicative of an RFQ engine's intricate market microstructure logic, emerges from complex FIX Protocol connectivity. Its sharp edges signify high-fidelity execution and price discovery precision for institutional digital asset derivatives

A Case Study in Alpha Decay

Consider a quantitative hedge fund that developed a new machine learning model for statistical arbitrage in the tech sector. The development team spent six months training a complex ensemble of gradient-boosted trees on data from 2010-2019. They tested thousands of feature combinations, including microstructural data, sentiment analysis from news articles, and various technical indicators.

The final model, “ArbX,” produced a stunning in-sample backtest with a Sharpe ratio of 3.5 and a maximum drawdown of only 5%. The equity curve was a near-perfect 45-degree line.

The fund’s investment committee, impressed by the results, approved the model for a $50 million allocation. ArbX was deployed in January 2020. For the first two months, its performance was flat. Then, as the COVID-19 pandemic induced a massive market shock, the model began to hemorrhage money.

The subtle statistical relationships it had learned from the relatively stable 2010-2019 period were completely irrelevant in the new high-volatility regime. The model, which had been trained to identify mean-reverting pairs, was now facing a market where correlations went to one and seemingly stable relationships broke down entirely. By the end of April 2020, the strategy was down 25%, and the fund was forced to liquidate the position. A post-mortem analysis revealed that the model had over-indexed on transient, noise-driven patterns from the previous decade.

Its spectacular backtest was a perfect example of overfitting, and the alpha it promised was entirely illusory. The true measure of its alpha was not 3.5, but a deeply negative number, a fact that was obscured by a flawed validation process.

Intersecting sleek components of a Crypto Derivatives OS symbolize RFQ Protocol for Institutional Grade Digital Asset Derivatives. Luminous internal segments represent dynamic Liquidity Pool management and Market Microstructure insights, facilitating High-Fidelity Execution for Block Trade strategies within a Prime Brokerage framework

References

  • Bailey, David H. Jonathan M. Borwein, Marcos Lopez de Prado, and Qiji Jim Zhu. “Pseudo-Mathematics and Financial Charlatanism ▴ The Effects of Backtest Overfitting on Out-of-Sample Performance.” Notices of the American Mathematical Society, vol. 61, no. 5, 2014, pp. 458-471.
  • Lopez de Prado, Marcos. “Advances in Financial Machine Learning.” Wiley, 2018.
  • Harvey, Campbell R. and Yan Liu. “Backtesting.” The Journal of Portfolio Management, vol. 42, no. 1, 2015, pp. 13-28.
  • Harvey, Campbell R. Yan Liu, and Heqing Zhu. “. and the Cross-Section of Expected Returns.” The Review of Financial Studies, vol. 29, no. 1, 2016, pp. 5-68.
  • Lo, Andrew W. and A. Craig MacKinlay. “Data-Snooping Biases in Tests of Financial Asset Pricing Models.” The Review of Financial Studies, vol. 3, no. 3, 1990, pp. 431-467.
  • White, Halbert. “A Reality Check for Data Snooping.” Econometrica, vol. 68, no. 5, 2000, pp. 1097-1126.
  • Su, Chishen, and Hsin-Chia Fu. “Avoiding Overfitting in Backtesting of Trading Strategies.” IEEE Transactions on Neural Networks and Learning Systems, vol. 24, no. 1, 2013, pp. 130-142.
  • Bailey, David H. and Marcos Lopez de Prado. “The Probability of Backtest Overfitting.” Journal of Computational Finance, vol. 20, no. 4, 2017, pp. 39-70.
A sleek, illuminated object, symbolizing an advanced RFQ protocol or Execution Management System, precisely intersects two broad surfaces representing liquidity pools within market microstructure. Its glowing line indicates high-fidelity execution and atomic settlement of digital asset derivatives, ensuring best execution and capital efficiency

Reflection

Intricate circuit boards and a precision metallic component depict the core technological infrastructure for Institutional Digital Asset Derivatives trading. This embodies high-fidelity execution and atomic settlement through sophisticated market microstructure, facilitating RFQ protocols for private quotation and block trade liquidity within a Crypto Derivatives OS

The Integrity of the System

The analysis of backtest overfitting moves the conversation about alpha from a simple search for profitable patterns to a much deeper inquiry into the integrity of the discovery process itself. A model’s alpha is not a number; it is an emergent property of a robust, well-designed research and validation system. The distortion caused by overfitting is a symptom of a flawed system, one that prioritizes the appearance of success over the reality of resilience. Building a framework that resists this temptation requires more than statistical tools; it demands an intellectual and organizational commitment to skepticism, discipline, and transparency.

Ultimately, the true measure of a model’s alpha can only be assessed through the lens of the system that created it. Is the system designed to find truth, or is it designed to produce a compelling story? Does it challenge its own conclusions with the same rigor it applies to its initial hypotheses?

The capacity to generate sustainable, genuine alpha is inextricably linked to the quality of the answers to these questions. The final output of a quantitative process is not a strategy, but a system capable of learning, adapting, and, most importantly, distinguishing between a durable market insight and a seductive statistical phantom.

Internal hard drive mechanics, with a read/write head poised over a data platter, symbolize the precise, low-latency execution and high-fidelity data access vital for institutional digital asset derivatives. This embodies a Principal OS architecture supporting robust RFQ protocols, enabling atomic settlement and optimized liquidity aggregation within complex market microstructure

Glossary

Abstract geometric representation of an institutional RFQ protocol for digital asset derivatives. Two distinct segments symbolize cross-market liquidity pools and order book dynamics

Backtest Overfitting

Meaning ▴ Backtest overfitting describes the phenomenon where a quantitative trading strategy's historical performance appears exceptionally robust due to excessive optimization against a specific dataset, resulting in a spurious fit that fails to generalize to unseen market conditions or future live trading.
Abstract geometric forms, including overlapping planes and central spherical nodes, visually represent a sophisticated institutional digital asset derivatives trading ecosystem. It depicts complex multi-leg spread execution, dynamic RFQ protocol liquidity aggregation, and high-fidelity algorithmic trading within a Prime RFQ framework, ensuring optimal price discovery and capital efficiency

Quantitative Finance

Meaning ▴ Quantitative Finance applies advanced mathematical, statistical, and computational methods to financial problems.
Central intersecting blue light beams represent high-fidelity execution and atomic settlement. Mechanical elements signify robust market microstructure and order book dynamics

Historical Data

Meaning ▴ Historical Data refers to a structured collection of recorded market events and conditions from past periods, comprising time-stamped records of price movements, trading volumes, order book snapshots, and associated market microstructure details.
A sharp, metallic blue instrument with a precise tip rests on a light surface, suggesting pinpoint price discovery within market microstructure. This visualizes high-fidelity execution of digital asset derivatives, highlighting RFQ protocol efficiency

Overfitted Model

Proprietary models offer bespoke risk precision for competitive advantage; standardized models enforce systemic stability via uniform rules.
Abstract system interface on a global data sphere, illustrating a sophisticated RFQ protocol for institutional digital asset derivatives. The glowing circuits represent market microstructure and high-fidelity execution within a Prime RFQ intelligence layer, facilitating price discovery and capital efficiency across liquidity pools

Sharpe Ratio

The Sharpe Ratio penalizes upside volatility by using standard deviation, which treats all return deviations from the mean as equal risk.
Dark precision apparatus with reflective spheres, central unit, parallel rails. Visualizes institutional-grade Crypto Derivatives OS for RFQ block trade execution, driving liquidity aggregation and algorithmic price discovery

P-Hacking

Meaning ▴ P-hacking refers to the practice of performing multiple statistical tests or analyses on a dataset and selectively reporting only those results that achieve a desired level of statistical significance, typically a p-value below a conventional threshold like 0.05. This methodological flaw distorts the true probability of observed effects arising by chance, leading to an inflated likelihood of Type I errors, where a false positive is identified as a genuine finding.
A polished metallic modular hub with four radiating arms represents an advanced RFQ execution engine. This system aggregates multi-venue liquidity for institutional digital asset derivatives, enabling high-fidelity execution and precise price discovery across diverse counterparty risk profiles, powered by a sophisticated intelligence layer

Selection Bias

Meaning ▴ Selection bias represents a systemic distortion in data acquisition or observation processes, resulting in a dataset that does not accurately reflect the underlying population or phenomenon it purports to measure.
A sophisticated proprietary system module featuring precision-engineered components, symbolizing an institutional-grade Prime RFQ for digital asset derivatives. Its intricate design represents market microstructure analysis, RFQ protocol integration, and high-fidelity execution capabilities, optimizing liquidity aggregation and price discovery for block trades within a multi-leg spread environment

Out-Of-Sample Testing

Meaning ▴ Out-of-sample testing is a rigorous validation methodology used to assess the performance and generalization capability of a quantitative model or trading strategy on data that was not utilized during its development, training, or calibration phase.
Sharp, transparent, teal structures and a golden line intersect a dark void. This symbolizes market microstructure for institutional digital asset derivatives

Walk-Forward Analysis

Meaning ▴ Walk-Forward Analysis is a robust validation methodology employed to assess the stability and predictive capacity of quantitative trading models and parameter sets across sequential, out-of-sample data segments.
A transparent glass sphere rests precisely on a metallic rod, connecting a grey structural element and a dark teal engineered module with a clear lens. This symbolizes atomic settlement of digital asset derivatives via private quotation within a Prime RFQ, showcasing high-fidelity execution and capital efficiency for RFQ protocols and liquidity aggregation

Marcos Lopez De Prado

Meaning ▴ Marcos Lopez De Prado is a preeminent quantitative finance researcher and practitioner, widely recognized for his foundational contributions to the application of machine learning and advanced statistical methods in financial markets.
A precision-engineered metallic institutional trading platform, bisected by an execution pathway, features a central blue RFQ protocol engine. This Crypto Derivatives OS core facilitates high-fidelity execution, optimal price discovery, and multi-leg spread trading, reflecting advanced market microstructure

Cross-Validation

Meaning ▴ Cross-Validation is a rigorous statistical resampling procedure employed to evaluate the generalization capacity of a predictive model, systematically assessing its performance on independent data subsets.
A multi-faceted crystalline star, symbolizing the intricate Prime RFQ architecture, rests on a reflective dark surface. Its sharp angles represent precise algorithmic trading for institutional digital asset derivatives, enabling high-fidelity execution and price discovery

Synthetic Data

Meaning ▴ Synthetic Data refers to information algorithmically generated that statistically mirrors the properties and distributions of real-world data without containing any original, sensitive, or proprietary inputs.
Angular, reflective structures symbolize an institutional-grade Prime RFQ enabling high-fidelity execution for digital asset derivatives. A distinct, glowing sphere embodies an atomic settlement or RFQ inquiry, highlighting dark liquidity access and best execution within market microstructure

In-Sample Results

Determining window length is an architectural act of balancing a model's memory against its ability to adapt to market evolution.
A precision metallic mechanism, with a central shaft, multi-pronged component, and blue-tipped element, embodies the market microstructure of an institutional-grade RFQ protocol. It represents high-fidelity execution, liquidity aggregation, and atomic settlement within a Prime RFQ for digital asset derivatives

Performance Metrics

Pre-trade metrics forecast execution cost and risk; post-trade metrics validate performance and calibrate future forecasts.