With topological data analysis, predicting stock market crashes

We are investigating the evolution of four big US stock market indexes' regular returns after the 2000 technology crash and the 2007-2009 financial crisis. Our approach is based on topological data processing (TDA). To identify and measure topological phenomena occurring in multidimensional time series, we use persistence homology. We obtain time-dependent point cloud data sets using a sliding window, which we connect a topological space for. Our research indicates that a new method of econometric analysis is offered by TDA, which complements the traditional statistical tests. The tool may be used to predict early warning signs of market declines that are inevitable.


Introduction
As long as the financial markets are in operation, there will be financial crises. When the market falls most of it loses, anyone who can forecast it can protect their investments or take aggressive short positions to make a profit (a nevertheless stressful situation to be in, as depicted in the Big-short). An asset in the market is associated with a competitive mechanism, the price of which varies according to the details available. A wide variety of information is used to assess the price of an asset on a stock exchange and, under the effective market theory, a simple adjustment in the information would be automatically priced in.
"The dynamics of financial systems are comparable to those of physical systems" Often as phase transformations between solids, liquids, and gases happen, we can differentiate a normal market regime from a chaotic one. Observations indicate that a cycle of intensified asset price oscillation precedes market crashes [1]. This anomaly transforms into an abnormal change in the time series' geometric structure. In this research, in order to create an accurate detector for stock market crashes, we use topological data analysis (TDA) to capture these geometric changes in the time series.

The Past Literature
Topological Data Analysis (TDA) is a new field that arose during the first decade of the century from separate works in applied(algebraic) topology and computational geometry. Although geometric methods for data analysis can be traced back very far in the past, it began as a discipline of persistent homology with the groundbreaking works of Edelsbrunner et al [1]. and Zomorodianand Carlsson [2] was popularized in a seminal paper in 2009 [3]. Tda is primarily inspired by the concept that topology and geometry offer an effective approach to infer strong qualitative and often quantitative data structure knowledge from Chazalal [4].
Tda seeks to include well-founded mathematical, statistical and algorithmic techniques to infer, evaluate and manipulate the raw data of complex topological and geometric systems that are frequently described in Euclidean or more common metric spaces as point cloud data. A major effort has been made over the last few years to provide tda with stable and effective data structures and algorithms that are now integrated and usable and easy to use via standard libraries such as the Gudhi library (C++ and Python) Maria et al [5] and its R program interface Fasy et al [6].

TDA Pipeline
Tda has recently acknowledged advances in different directions and fields of implementation. A wide range of methods driven by topological and geometric approaches are now available. It is beyond the reach of this introductory survey to provide a full review of all these current methods. Most of them, however, depend on the following fundamental and normal pipeline that will act as the foundation of this paper: 1. A finite set of points with a notion of distance or resemblance between them is considered to be the input.
The metric in the ambient space (e.g. the Euclidean metric when the data is embedded in R d ) will induce this distance or appear as an inherent metric described by a distance matrix pairwise. As an input or directed by the program, the description of the metric on the data is generally given. However, it is important to note that it could be critical to use a metric to uncover fascinating topological and geometric data characteristics.
2. In order to illustrate the underlying topology or geometry, a "continuous" form is constructed on top of the results. This is also a simplicial complex or a nested family of simplicial complexes, referred to as a filtration, which represents the data structure at various scales. Simplicial complexes can be used in many standard data processing or learning algorithms as higher dimensional generalizations of adjacent graphs that are classically constructed on top of data. The task here is to identify constructs that have been shown to represent important knowledge about the data structure and that can be built and easily manipulated successfully in actual practice.

TDA on Data Science
On the application side, the role of topological and geometric methods in a growing number of fields has been seen by several recent promising and successful findings, such as material sciences [7][8], 3D shape analysis [9][10], multivariate time series analysis [11], biology [12], chemistry [13], network sensors [14]. An comprehensive list of applications for tda is outside the scope. On the other hand, much of TDA's contributions are attributed to its conjunction with other types of study or learning.

Method
In this Study, we evaluate daily S&P 500 index prices from 1980 to the present. The S&P is an index that is widely used to assess the state of the financial market as it calculates the 500 large-cap US companies' stock performance. We find that topological signals appear to be resilient to noise and therefore less likely to generate false positives compared to a simple baseline. This study is one among several main reasons behind TDA, namely that topology and geometry in complex data can provide a powerful way to abstract subtle structure. Considering that market crashes involve a sudden fall on stock prices, monitoring the first derivative of average price values over a rolling window is one easy approach to detecting these changes. Indeed, we can see in the figure above that the Black Monday crash (1987), the burst of the dot-com bubble (2000)(2001)(2002)(2003)(2004), and the financial crisis (2007-2008) are already captured in this naïve approach. We may add a threshold to mark points on our original time series where a crash occurred by normalising this time series to take values in the [0,1] interval. Following this guideline could lead to over-panicking and selling our assets too fast, with several points labeled as a crash.

Topological Model
The underlying TDA mathematics is detailed and will not be discussed in this article. It is enough for our purposes to think of TDA as a way of extracting descriptive characteristics that can be used for downstream modeling. The pipeline we have built is constructed from: • Embedding of the time series into a point cloud and construction of point cloud sliding windows • To create a filtration on each window to provide a developing structure encoding each window's geometrical shape • Using persistence homology, extracting the related features of those windows • By measuring the difference between these features from one window to the next, comparing each window • Constructing an indicator of crash based on this difference

Time series as point clouds
In a TDA pipeline, a typical starting point is to generate a simplified complex from a point cloud. In time series applications, therefore, the crucial question is how to generate such point clouds? Typically, discrete time series, like those we are considering, are visualized in two dimensions as scatter plots. By scanning the plot from left to right, this representation makes the local behavior of the time series easy to track. But it is often ineffective in transmitting significant effects that may occur over larger timeframes.
Fourier analysis provides one well-known set of methods for capturing periodic behaviour. For example, the discrete Fourier transformation over the time series of a temporal window provides information on whether the signal in that window occurs as the total amount of several basic periodic signals. We see a different way of encoding a time-evolving process for our purposes. It is based on the idea that some of the dynamics' key properties can be effectively unveiled in higher dimensions. We begin by illustrating a way of representing as a point cloud a univariate time series, i.e. a set of vectors in an arbitrary dimensional Euclidean space. The procedure works as follows: we pick two integers d and τ . For each time tᵢ ∈ (t₀, t₁, …) , At different times, we collect the values of the variable y at d , evenly spaced by τ and starting at tᵢ , and describe them as a vector with d entries, notably: A set of vectors in d -dimensional space is the result, The time delay parameter is called τ , and the embedding dimension is called d . After Floris Takens, which demonstrated its significance with a celebrated theorem in the context of nonlinear dynamical systems, this time-delay embedding technique is also called Takens' embedding. Finally, it leads to a time series of point clouds (one per sliding window) with potentially interesting topologies to apply this procedure separately on sliding windows over the full time series. How such a point cloud is generated in 2 dimensions is shown in the figure 5 above.

point clouds to persistence diagrams
What can we do with this data now that we know how to generate a time series of point clouds, Enter persistent homology, which looks for topological features that persist over a certain range of parameter values in a simplified complex. Typically, a feature, such as a hole, will not be observed at first, then it will appear, and the parameter will disappear again after a range of values.

Distances between persistent diagrams
We can measure a set of distance metrics, given two windows and their corresponding persistence diagrams. We compare two distances here, one based on the notion of a landscape of persistence, the other on Betti curves. We can infer from these figures that the metric is less noisy than the Betti curves, based on landscape distance.

Results
It is indeed a good method to normalize it, as we did for the baseline model, using the landscape distance between windows as our topological feature. The subsequent detection of stock market declines due to the dot-com bubble and global financial crisis as seen below. We can see that using topological features tends to reduce the noise in the signal of interest relative to our basic baseline.

Conclusion
Our results suggest that geometric signatures that can be more robustly identified using topological data analysis are produced by the periods of high volatility preceding a crash. These results, however, affect only a single industry and for a brief period of time, so the robustness of the process in various markets and differing thresholds should be further studied. However, the findings are promising and open up some exciting ideas for future growth.