Multiaspect data are ubiquitous in modern Big Data applications. For instance, different aspects of a social network are the different types of communication between people, the time stamp of each interaction, and the location associated to each individual. How can we jointly model all those aspects and leverage the additional information that they introduce to our analysis? Tensors, which are multidimensional extensions of matrices, are a principled and mathematically sound way of modeling such multiaspect data. In this article, our goal is to popularize tensors and tensor decompositions to Big Data practitioners by demonstrating their effectiveness, outlining challenges that pertain to their application in Big Data scenarios, and presenting our recent work that tackles those challenges. We view this work as a step toward a fully automated, unsupervised tensor mining tool that can be easily and broadly adopted by practitioners in academia and industry.
Many real-world phenomena, especially in the age of Big Data, produce data and metadata that are inherently multiaspect. For instance, social interaction among individuals is a naturally multiaspect process. Social interaction has multiple modes or aspects: the means of interaction (e.g., who calls whom, who messages whom, and who is friends on Facebook with whom), the time of the interaction, the location, as well as the text and the language associated to it.
Such multiaspect data are ubiquitous in the modern interconnected world and there is imperative need for methods that model and process that data, and extract useful knowledge out of them, which can be used for decision support and scientific discovery.
A very powerful set of tools that are invaluable in that endeavor is tensors and tensor decompositions. A tensor is a multidimensional extension of a matrix and the number of dimensions of the tensor is called “order” or “modes.” For instance, a matrix is a two-mode tensor, and a data cube is a three-mode tensor. Tensors are very expressive structures and they can naturally model multiaspect data such as the ones in our social interaction example: if we simply record the interactions between individuals, then we have a matrix (or two-mode tensor) of (person, person); if we additionally record the means of interaction, then we have a three-mode tensor of (person, person, means of interaction); if, on top of that, we have time-stamped events, this results in a four-mode tensor of (person, person, means of interaction, time); and if we have location information (which is now ubiquitous in most online social network platforms), we end up with a five-mode tensor of (person, person, means of interaction, time, location). Depending on what type of multiaspect data our application entails, we can have a corresponding tensor that models those data concisely.
Having successfully modeled our data using a tensor, how do we extract useful knowledge from the data? For this purpose, we use a tool called tensor decompositionor tensor factorization(both terms used interchangeably in this article). There exist multiple flavors of tensor decompositions that have different properties and we invite the interested reader to read1 which contains excellent introductory material to the inner workings of tensor decompositions and2 which is an excellent introduction to unsupervised data analysis using tensors. In this article, we will focus on the so-called Canonical Decomposition, PARAFAC, or CP decomposition,3 henceforth referred to as PARAFAC.
A pictorial representation of PARAFAC is shown in Figure 1. Essentially, each rank-one component of the decomposition corresponds to a dense “block” of data within the data tensor. This block need not be formed by consecutive rows, columns, and third-mode “fibers,” but it can be visible after appropriately rearranging the rows, columns, and fibers.
FIG. 1. Pictorial representation of the PARAFAC decomposition. Each rank-one component corresponds to a dense block in the data.
Revisiting our social network example, suppose we have a three-mode tensor of (person, person, means of interaction) recording the amount of interactions between different people in a social network, taking its PARAFAC decomposition as shown in Figure 1 will result in a soft co-clustering of people and means of interaction: each latent component is a co-cluster, that is, as subset of people and means of interaction that exhibit very similar behavior. To familiarize the reader with the concept of co-clustering, in Figure 2 we present a very simple example of two co-clusters in a (user, movie) matrix that can conceptually contain movie ratings on Netflix by users. The main assumption of co-clustering is that postulating that a particular cluster of users enjoy viewing all movies the same is too restrictive. Instead, co-clustering relaxes this requirement and seeks to identify a group of users that have similar viewing behavior across a subset of the movies; thus, a co-cluster in a matrix is simply a subset of the rows (users) and columns (movies). In our particular example, there is one group of users who enjoy viewing horror movies, and a separate group of users who enjoy comedies. In reality, those co-clusters may very well overlap. Notice also that the “patch” in the matrix that denotes the co-cluster may not be immediately apparent to an analyst, since one needs to rearrange rows and columns appropriately to see it. When our data form a tensor, a co-cluster is a subset of rows, columns, and third-mode fibers, as shown in Figure 1. Such co-clusters manifest as blocks in the data, for which PARAFAC is ideal for uncovering. In fact, in Ref.4 the authors showed that PARAFAC with additional sparsity constraints on the factor vectors essentially yields a co-clustering on the tensor data. We invite the interested reader to read the article4 and references within for a more detailed treatment of co-clustering.
FIG. 2. Simple example of two co-clusters: a subset of people who enjoy watching a subset of movies (horror movies), and another subset who enjoy watching comedies.
This is very important for practitioners who wish to conduct exploratory analysis on the data and understand what latent clusters and patterns are present. There are also fundamental theoretical advantages of using tensors whenever data naturally possess multiple aspects; we refer the interested reader to Appendix 1 for a discussion.
Tensor decompositions have been shown to be effective in numerous fields. It is impressive that there exist multiple research communities that develop tensor algorithms and applications and demonstrate benefits of those approaches in their respective fields. There also exist multiple, cross-disciplinary meetings with sole topic that of tensor decompositions, which transcend different scientific disciplines, such as Psychology, Chemometrics, Signal Processing, and Data Mining.5–7
With this article, our hope is to familiarize the readers of this magazine with the concepts behind tensors and tensor decompositions, placing specific emphasis on a Big Data practitioner's point of view. To do so, in the next section (Tensor Applications), we briefly explore a few of the numerous applications where tensors have been successful in data science and outline the difficulties that unsupervised tensor mining entails. In the Challenges in Unsupervised Tensor Mining section, we draw solutions from the field of Chemometrics and demonstrate how we can extend them for Big Data applications. Subsequently, in the Automatic Unsupervised Tensor Mining section, we describe an automatic, data-driven framework for tensor mining, outlining its inner workings and demonstrating its effectiveness. The Case Study section contains an indicative example of using tensor decomposition in conjunction with our framework for analyzing real data. Finally, in the Conclusions section, we conclude our discussion.
To view more click here