Tensor Methods for Emerging Data Science Challenges

A KDD 2019 Workshop

KDD 2019 Overview Dates Program Schedule Keynote Invited Speakers Poster Papers Organization PC

2019 KDD Workshop on Tensor Methods for Emerging Data Science Challenges

Tensor decompositions have become popular tools in data analysis. A typical application of tensor decompositions in KDD has been analysis of time-varying graphs, where every slice of the tensor represents one snapshot of the graph. Recently, though, tensor decomposition ideas have become more popular in a wider collection of topics, such as, Internet of Things, sensor arrays, or healthcare data.

The KDD 2019 Workshop on Tensor Methods for Emerging Data Science Challenges aims to bring together computer scientists, data scientist and domain scientists to explore how tensors can be used to solve these emerging problems.

Dates

May 19, 2019 (The deadline for submitting the papers)

June 15, 2019 (The deadline for sending out paper notifications)

The workshop will be organized on August 5, 2019, in connection with KDD 2019 at Anchorage, Alaska.

Program Schedule

08:00-08:05 Welcome

08:05-09:00 Evrim Acar: Unraveling Interpretable Patterns through Data Fusion based on Coupled Matrix and Tensor Factorizations

09:00-09:30 Rose Yu: Fast and Interpretable Tensor Methods for Spatiotemporal Analysis

09:30-10:00 Coffee

10:00-10:30 Kimis Perros: Scalable Unsupervised Phenotyping using Tensor Factorization

10:30-11:00 Shaden Smith: Scaling Up Sparse Tensor Factorization

11:00-11:30 Poster flash presentations

11:30-12:00 Posters

Keynote

Speaker: Dr. Evrim Acar (Ataman), Chief Research Scientist at Simula Metropolitan Center for Digital Engineering (Oslo, Norway)

Title: Unraveling Interpretable Patterns through Data Fusion based on Coupled Matrix and Tensor Factorizations

Abstract: Fusing complementary signals from different modalities holds the promise to lead to the discovery of more accurate diagnostic biomarkers for various diseases. For instance, joint analysis of signals from different neuroimaging techniques has the potential to reveal biomarkers for psychiatric disorders. However, biomarker discovery through data fusion is challenging since it requires extracting interpretable and reproducible patterns from data sets consisting of shared as well as unshared patterns, and often of different orders, e.g., multi-channel electroencephalography (EEG) signals represented as a third-order tensor with modes: subjects, time, and channels, and functional magnetic resonance imaging (fMRI) data in the form of a subjects by voxels matrix. Traditional fusion methods rearrange higher-order tensors as matrices and use matrix factorization-based fusion approaches with additional constraints such as orthogonality to extract patterns uniquely. Rather than imposing such constraints, we preserve the multiway structure of higher-order tensors, formulate data fusion as a coupled matrix and tensor factorization (CMTF) problem and discuss its extension to structure-revealing data fusion, i.e., fusion models that can identify shared and unshared patterns in coupled data sets. Numerical experiments on prototypical and real coupled data sets demonstrate that the structure-revealing CMTF model can capture the underlying patterns more accurately than matrix factorization-based fusion methods by exploiting the low-rank structure of higher-order tensors. We will discuss applications of CMTF-based fusion models in metabolomics and neuroscience.

Short Bio: Evrim Acar is a Chief Research Scientist at Simula Metropolitan Center for Digital Engineering (Oslo, Norway). Her research focuses on data mining, in particular, tensor factorizations, data fusion using coupled factorizations of higher-order tensors and matrices, and their applications in diverse disciplines. Prior to joining Simula, Evrim was a faculty member at the Chemometrics and Analytical Technology group at the University of Copenhagen (Denmark), and a postdoctoral researcher at Sandia National Labs (Livermore, CA). She received her MS and PhD in Computer Science from Rensselaer Polytechnic Institute (Troy, NY) in 2006 and 2008, respectively.

Invited Speakers

Dr. Rose Yu, Assistant Professor at Northeastern University

Title: Fast and Interpretable Tensor Methods for Spatiotemporal Analysis

Abstract: Multivariate spatiotemporal data is ubiquitous in science and engineering, from sports analytics to neuroscience. Such data can be naturally represented as a multiway tensor. Tensor latent factor models provide a powerful tool for reducing the dimensionality and discovering the higher-order latent structures from data. However, existing tensor models are often slow or fail to yield latent factors that are easy to interpret by domain experts. In this talk, I will demonstrate advances in tensor methods to generate interpretable latent factors for high-dimensional spatiotemporal data. In particular, I will discuss (1) a multiresolution tensor learning algorithm, that can leverage the multicale property of high-resolution spatial data, to speed up training and learn interpretable patterns. (2) a tensor latent feature learning algorithm that can learn binary representations of data that are both memory efficient and easy to interpret. We provide theoretical guarantees for our optimization algorithms and demonstrate their applications to real-world data from basketball plays and neuroscience.

Bio: Dr. Yu is an Assistant Professor in the Khoury College of Computer Sciences at Northeastern University. Previously, she was a postdoctoral researcher in Caltech Computing and Mathematical Sciences. She earned her PhD in Computer Sciences at the University of Southern California and was a visiting researcher at Stanford University. Her research focuses on developing machine learning techniques for large-scale time series and spatiotemporal data. She is generally interested in the theory and applications of deep learning, tensor optimization and spatiotemporal modeling. Her work has been successfully applied to intelligent transportation, climate informatics, and aerospace control. Among her awards, she has won the best dissertation award in USC computer science, best paper award at NIPS time series workshop, and was nominated as one of the ``MIT Rising Stars in EECS''.

Dr. Shaden Smith, Research Scientist at Intel Labs

Title: Scaling Up Sparse Tensor Factorization

Abstract: Tensor factorization is a powerful technique for analyzing multi-way data and has applications in fields such as cybersecurity, social network analysis, and health analytics. The tensors that arise in these domains are increasingly large, sparse, and high dimensional. The ubiquity of multi-core processors and large-scale clusters motivates the development of scalable parallel approaches for sparse tensor computations.

This talk presents several challenges addressed in the field of high-performance sparse tensor factorization. Topics include efficient data structures for sparse tensors, parallel algorithms for multi-core architectures, and data decompositions for distributed-memory systems. This research is culminated in SPLATT, an open source toolkit for sparse tensor factorization that is used by academia, industry, and government.

Bio: Shaden Smith is a research scientist at Intel's Parallel Computing Laboratory. Shaden's research is at the intersection of high performance computing and data science, with a current focus on enabling and accelerating large-scale sparse tensor factorization and graph analytics. He was a recipient of the 2017 ACM/IEEE-CS George Michael Memorial HPC Fellowship and was awarded the Euro-Par'17 distinguished paper award. Shaden received his PhD from the University of Minnesota, where he was advised by George Karypis.

Dr. Ioakeim (Kimis) Perros, Lead Machine Learning Scientist at Health[at]Scale

Title: Scalable Unsupervised Phenotyping using Tensor Factorization

Abstract: Originally purposed to streamline documentation of care, electronic health records (EHRs) provide a massive amount of diverse and readily available data that can be used to tackle important healthcare problems. Clinical phenotyping is one of them, which refers to identifying patient subgroups sharing common clinically-meaningful characteristics. However, there are significant challenges in using EHR data to computationally tackle this problem, related to algorithmic scalability, model interpretability and the longitudinal nature of patient data. In this talk, recent developments in the area of tensor factorization will be presented which effectively tackle those challenges.

Bio: Ioakeim (Kimis) Perros is a Lead Machine Learning Scientist at Health[at]Scale. He obtained his Ph.D. in Computational Science & Engineering from Georgia Tech. His research focus is on developing and applying machine learning methods for healthcare applications. He has interned with the Health Informatics Division of Weill Cornell Medicine, the RD&D Department of Sutter Health and the Healthcare AI group of Microsoft Research Cambridge. He has published in top Data Mining conferences (KDD, SDM, ICDM) and biomedical informatics journals (JBI). He has co-authored papers in top machine learning (NeurIPS), high-performance computing (SC, IPDPS), knowledge management (CIKM) and geospatial data analysis (SIGSPATIAL) venues.

Poster Papers

Qiuwei Li, Gongguo Tang, Kai Liu and Hua Wang "General Tensor Recovery via Alternating Minimization"

Simon Woo, Youjin Shin, Sangyup Lee and Shahroz Tariq "Tensor Decomposition for Anomaly Detection in Space"

Sefki Kolozali, Lia Chatzidiakou, Roderic Jones, Jennifer K. Quint, Frank Kelly and Benjamin Barratt "A probabilistic multi-aspect learning model for the early detection of COPD patients' symptoms" [pdf]

M.A.O. Vasilescu and E. Kim "Compositional Hierarchical Tensor Factorization: Representing a Hierarchical Intrinsic and Extrinsic Causal Factors"[pdf]

Deepak Maurya, Balaraman Ravindran and Shankar Narasimhan "Hyperedge Prediction using Tensor Eigenvalue Decomposition"

Yang Shi and Animashree Anandkumar "Higher-order Count Sketch: Dimensionality Reduction That Retains Efficient Tensor Operations"

Organization

Program Committee:

Ian Davidson, Professor (University of California, Davis)

Pauli Miettinen, Professor (University of Eastern Finland)

Vagelis Papalexakis, Assistant Professor (University of California, Riverside)

Gowtham Atluri, Asst. Professor (University of Cincinnati)

Zilong Bai, Ph.D. Candidate (University of California, Davis)

James Bailey, Professor (The University of Melbourne)

Jeffrey Chan, Senior Lecturer (equivalent to Assistant Professor) (RMIT University)

Dora Erdos, Lecturer & Undergraduate Program Director (Boston University)

Ekta Gujral, PhD Student (University of California, Riverside)

Joyce Ho, Assistant Professor (Emory University)

Kejun Huang, Assistant Professor (University of Florida)

Jiajia Li, Computer Scientist (Pacific Northwest National Laboratory)

Saskia Metzler, PhD Student (Max Planck Institute for Informatics)

Ioakeim Perros, Ph.D., Lead Machine Learning Scientist at Health[at]Scale

Shaden Smith, Ph.D., Research Scientist at Intel Labs

Call for Papers

We invite submission covering novel tensor decomposition models, methods, algorithms or applications. The following is a non-exhaustive list of topics of interest:

Novel models for tensor decompositions
Novel algorithms for existing tensor decomposition models
Tensor decompositions over other algebraic stuctures
Novel ways of representing problems as tensor problems
Tensors and IoT
Tensors and sensors
Tensors and healthcare
Explainable and interpretable tensor decompositions

The submissions can be of any type (research paper, opinion paper, vision paper, demonstration of a system, white paper, work-in-progress report, etc.) Selected papers will be given either an oral presentation or a poster presentation.

The deadline for submitting the papers is May 19, 2019. The paper notifications will be sent by June 15, 2019. The workshop will be organized on August 5, 2019, in connection with KDD 2019 at Anchorage, Alaska. At least one author of the accepted papers must attend the workshop.

Submission

This workshop is not accepting new submissions via EasyChair after the deadline of submission.

Format guidline

ACM style. Template can be found at: https://www.acm.org/publications/proceedings-template.
Max. 9 pages (incl. all references etc).
Can have further appendices, but they can be ignored in the reviewing.
No double-blind.