금융저널에 실린 기계학습논문을 구현한 MLFinLab

1.
기계학습과 금융공학을 결합한 Python기반의 소프트웨어들이 붐을 이룹니다. 물론 오픈소스입니다. 오늘 소개하는 Machine Learning Financial Laboratory (MlFinLab)을 개발한 회사는 Hudson and Thames Quantitative Research 입니다. 이 팀이 지향하는 바를 간단히 정리한 내용입니다. 다양한 논문을 기반으로 하여 기계학습을 집중적으로 연구하는 그룹이라고 소개합니다.

Hudson and Thames Quantitative Research is a research group with a focus on financial machine learning, whose goal is building out implementations and extending the literature.

A Lab for Machine Learning in Finance을 보면 이 팀이 위와 같은 방향을 정하는데 결정적인 영향을 준 분은 Dr. Marcos Lopez de Prado 입니다. 블로그에 많이 다루었던 분입니다.

Paradigm Shift

Lopez de Prado’s book provides us with a very different way of building investment strategies. Rather providing readers with alpha generation techniques, the book provides a framework which can be leveraged to produce robust investment strategies.

The techniques he proposes are rather different when compared to the style of factor-based investing from Grinold and Kahn 2000 or Chincarini and Kim 2006.

A lot of focus is spent on enhancing features and sampling techniques to boost statistical properties. The exciting thing is that those techniques could be applied to factor-based portfolios and the hope is that they would perform even better (perhaps a good future paper would be able to test this empirically).

Practitioners have long known that the covariance structure between various assets is an important feature for forecasting risk as well as returns. This often leads to a model which follows a many-to-many architecture such as a vector autoregression (VAR) model, Lopez de Prado however, makes use of a many-to-one structure. He proposes modeling one asset at a time, which is enforced by making use of the volume clock sampling techniques and its derivatives.

Lopez de Prado also takes more of a trading approach rather than an investing one, which makes a lot of sense in the context of machine learning. A good example is how structural breaks are used to set-up trades. These trades are then modeled using machine learning and the position sizes are determined using meta-labeling in combination with bet sizing algorithms.

At first glance, readers may assume that Lopez de Prado suggests using features derived from only price action such as market microstructure features and structural breaks but that is only because he elaborated on those chapters. Our understanding is that the models discussed in his work can take a wide range of features, from traditional accounting ratios and macroeconomic data to satellite imagery and features compressed using dimensionality reduction techniques.

Another key contribution from his work is stressing the importance of keeping count of the number of trials you run in order to avoid a false discovery. He proposes a Deflated Sharpe ratio and is vocal about the implications of running backtests in an iterative process. Rather he suggests using metrics such as feature importance and correct cross-validation techniques which are finance specific.

2.
이 팀이 만든 라이브러는 Machine Learning Financial Laboratory (MlFinLab)입니다. Python을 기반으로 하였습니다. 제공하는 주요 기능을 보면 다음과 같습니다.

MLFinLab is our flagship python library, which helps portfolio managers and traders who want to leverage the power of machine learning by providing reproducible, interpretable, and easy to use tools.

Popular Modules

Financial Data Structures

Standard: Tick, Volume, Dollar bars. Information-Driven Bars: Imbalance and Run Bars (Tick, Volume, Dollar).

Labelling Techniques

Triple-Barrier, Meta-Labeling, Trend Scanning, Tail-Sets, Matrix Flags, Excess Over Mean/Median, Return Vs. Benchmark.

Feature Engineering

Fractionally Differentiated, Structural Breaks (CUSUM, Explosiveness Tests), Market-Microstructural.

Machine Learning

Sampling, Sequentially Bootstrapped Ensembles, Feature Importance (MDI, MDA, Model Fingerprint), Cross-Validation(Purged, Embargo), Bet Sizing (EF3M).

Portfolio Optimization

Mean-Variance, Black-Litterman, Hierarchical Risk Parity, Hierarchical Equal Risk Contribution, Nested Clustered Optimization.

Risk Estimators

Min Cov Determinant, MLE Covariance Estimator, Shrinkage, De-noising and De-toning, Hierarchical Cluster Filtering, Theory Implied Correlation.

Online Portfolio Selection

Benchmarks, Momentum, Mean Reversion (PAMR, OLMAR), Pattern Matching (CORN, SCORN, FCORN, FCORN-K), Universal Portfolios.

Pairs Trading

Codependence, Co-integration (Engle-Granger, Johansen), Optimal Timing of Trades (Entry, Take Profit, Stop Loss), Simulations (OU, XOU).

Synthetic Data Generation

Related to Correlation Matrices: CorrGAN, Vines (R, C, D, Partial Correlation), Extended Onion Method, Hierarchical Correlation Block Model.

MLFinLab은 아래 학술지에 실린 논문들의 논리를 구현하였다고 합니다.

We source all of our implementations from the most elite and peer-reviewed journals. Including publications from:

The Journal of Financial Data Science

The Journal of Portfolio Management

The Journal of Algorithmic Finance

Cambridge University Press

MLFinlab의 문서는 아래에서 확인하실 수 있습니다.

Documentation, Tutorials, Videos, and Source Code

MLFinLab을 이용하여 논문을 분석하는 사례는 Research에서 확인하실 수 있습니다. 예를 들어 Model Interpretability: The Model Fingerprint Algorithm은 Beyond the Black Box: An Intuitive Approach to Investment Prediction with Machine Learning을 분석하였습니다.

The complexity of machine learning models presents a substantial barrier to their adoption for many investors. The algorithms that generate machine learning predictions are sometimes regarded as “black box”, demanding interpretation and additional explanation. In this paper, we present a framework for demystifying the behavior of machine learning models and decomposing their predictions into linear, nonlinear, and interaction components. We also show how to decompose a model’s predictive efficacy into these same components. Together, this analysis forms a “model fingerprint” which we use to summarize its key characteristics and illustrate its similarities and differences compared to other models. We present a case study of this approach applying random forest, gradient boosting machine, and neural network models to the challenge of predicting monthly currency returns. We find that all three models reliably identify intuitive effects in the currency market, and that they also find new relationships attributable to nonlinearities and variable interactions. We argue that an understanding of these predictive components may help astute investors generate superior risk-adjusted returns.

1 Comment

jh808 9월 7, 2022 at 8:58 오후

안녕하세요! 지나가던 재수생입니다. 금융에 관심이 있어서 검색하다가 블로그를 발견하게 되었습니다. 좋은글감사합니다!
질문이 한가지 있는데요 답변해주시면 정말 감사드리겠습니다! 제가알기로 금융쪽에서도 시간을 분석하고 변동성을 분석하는것이 되게 중요하다고 알고있는데요, 만약 어떤기술이 가격의 변곡점(?) 그러니까 어떤 구간에서 상승이든 하락이든 둘중 하나가 발생해서 그구간을 변곡점이라고 부를 수 있을만한 구간을 시간적으로 완벽하게 분석해낼 수 있다고 하면 거의 무적(?)에 가까운 전략을 구성할 수 있을까요?? 만약 이게 가능하다면 어떤 전략과 결합할 수 있을까요?? 감사합니다!

Reply ↓

금융저널에 실린 기계학습논문을 구현한 MLFinLab

Paradigm Shift

Popular Modules

Financial Data Structures

Labelling Techniques

Feature Engineering

Machine Learning

Portfolio Optimization

Risk Estimators

Online Portfolio Selection

Pairs Trading

Synthetic Data Generation

이 글 공유하기:

1 Comment

Leave a Comment 응답 취소