Where abstract constructs often lead to tangible breakthroughs, one operation stands out for its ubiquity and sheer computational appetite: matrix multiplication. This seemingly straightforward task is a cornerstone in a plethora of disciplines — from quantum physics and life sciences to economics and AI. Indeed, optimizing this fundamental mathematical operation promises to unleash transformations that could ripple through not just scientific research but human civilization itself. It’s a research frontier that’s as exciting as it is daunting, laden with unprecedented opportunities. Mathematicians have been pushing the record of solving the puzzle of faster matrix multiplication since the mid-20th century. In 2022, a research team at Google DeepMind aimed to break a 50-year-old record by discovering faster algorithms for matrix multiplication through training an AI system called AlphaTensor.
After the breakthrough results of AlphaTensor were published, mathematicians, recognizing the potential for even further approximations, used conventional computer-aided searches to improve one of the algorithms that the neural network had discovered. The key objective of the trained neural network was to find tensor decompositions within a finite factor space. For the first time, to our knowledge, AlphaTensor’s algorithm improves upon Strassen’s two-level algorithm, which was discovered 50 years ago. Here is a amazing Quanta Magazine article about that, if you want to know the whole story of decades of research and AlphaTensor.
A tensor, in mathematics, is an algebraic object that describes a multilinear relations between sets of algebraic objects related to a vector space. The order of a tensor is the number of dimensions, also known as ways or modes. In essence, tensor objects are generalization of scalars, vectors, and matrices. Notated as a multidimensional array, or in other words, an N-way array, Nth-order tensor; where ‘N’ represents the order of the tensor. A first-order tensor essentially defines what we commonly understand as a vector. It has only one axis and can be thought of as a one-dimensional array. A second-order tensor, on the other hand, defines a matrix. It possesses two axes and can be represented as a two-dimensional array. When you move to tensors of order three or higher, these are called higher-order tensors. They have three or more axes and can be conceptualized as arrays with three or more dimensions.
Decompositions of higher-order $N\ge 3$ tensors play a pivotal role in various fields of study and applications. Such decompositions break down a complex tensor into simpler, more manageable parts while retaining its original properties. E.g. CANDECOMP/PARAFAC (CP) decomposition and Tucker decomposition are two particular types that can be considered higher-order extensions of the well-known matrix singular value decomposition (SVD) and as mentioned, the importance of tensor decomposition transcends pure mathematics and finds practical utilities. Many complex mathematical and computational challenges involving tensors can be significantly simplified or even rendered tractable by employing appropriate tensor decompositions. In 1927, Hitchcock put forth the concept of representing a tensor in polyadic form, meaning it could be expressed as the sum of several rank-one tensors. Later, in 1944, Cattell introduced the notions of parallel proportional analysis and multi-dimensional axes for analysis, which include circumstances, objects, and features. Concept finally became popular after its third introduction, in the form of CANDECOMP (canonical decomposition) by Carroll and Chang [38] and PARAFAC (parallel factors) by Harshman.
To lay a proper foundation for the variants of tensor decomposition, we can first talk about the core principles and mathematical definitions that encapsulate what tensor decomposition truly is. Mathematically speaking, given a tensor $\tau \in R^{n_{1}×n_{2}×...×n_{d}}$ (CP form of a tensor) of order $d$, the goal of tensor decomposition is to approximate $\tau$ as a sum of rank-1 tensors, which is a tensor that can be written as the outer product of $d$ vectors. More formally, our aim is to find vectors $a_{i}^k \in R^{n_{k}}$ for $i=1,2,…,R$ and $k=1,2,…,d$ such that the tensor $\tau$ can be approximated as: $\tau ≈ {i=1}^{R} a{i}^1 a_{i}^2 … a_{i}^d$
Here, $R$ being the rank of the tensor, and product operator denotes the outer product. Value of $R$ typically determines the quality of the approximation: a lower $R$ often means less complexity and easier computability, but potentially at the cost of approximation accuracy. Contrary to the case of matrices, computing the rank of a tensor is NP-hard problem. This is considered as open problem in computational mathematics.
Specifically, Canonical Polyadic decomposition variant (CPD) can approximate an observed tensor by a sum of rank-1 tensors, which requires $O(dnr)$ parameters and while Tucker decomposition attempts to approximate tensors by a core tensor and several factor matrices, which requires \$O(dnr + r^d)$ parameters. Defining feature of the CPD is that it is essentially unique for tensor orders greater than two, allowing for unique parameter identification. mathematically, this can be stated as: for a tensor T of order N>2, the CP decomposition is essentially unique, facilitating the unambiguous identification of its parameters. There are many algorithms for calculating CP decomposition. The most popular in the literature is the Alternating Least Squares (ALS), which is an iterative optimization process that estimates the factor matrices in an alternating way.
However, the challenge with CPD lies in finding the optimal solution. This is because CPD optimization problems are often non-convex, and the search for the best factor matrices can get stuck in local minima. This complexity can make it computationally expensive and time-consuming to determine the ideal CPD representation. On the other hand, the Tucker decomposition provides a more stable and flexible representation of tensors. In Tucker decomposition, a tensor is expressed as a core tensor (also known as the core array) multiplied by a set of orthogonal matrices along each mode.
Canonical Polyadic Decomposition: Robust Manifold Nonnegative Tucker Factorization for Tensor Data Representation - Scientific Figure on ResearchGate. Available from: https://www.researchgate.net/figure/Canonical-Polyadic-Decomposition_fig1_365209303 [accessed 17 Oct, 2023]