4th Workshop on Deep Learning Practice and Theory for High-Dimensional Sparse and Imbalanced Data with KDD 2022

DLP-KDD2022 Introduction

In the increasingly digitized world, applications in a wide varity of domains are shifting to harness the ability to process, understand, and exploit data collected from different sources. Deep learning-based methods, in particularly, have recently empowered many applications to leveraging their data. This incluses examples from customer-centric applications, such as personalized recommendations, online advertising, search engines and interest/intention modeling from customers’ behavior. In these applications, leveraging deep learning tools can significantly enhance the user’s experience while increasing revenues. Data generated in customer-centric applications and other critical real-world domains such as health and medicine, biology, business, industrial engineering, etc are often high-dimensional, sparse and imbalanced. These adverse data properties challenge the application of deep learning in real-world applications due to the fact that they can cause poor model performance, failed projects, and potentially serious social implications.

The complexities explored here are different from many traditional deep learning applications, such as image classification and speech recorgnition, which have rich, dense datasets for model development and testing. Typical prediction tasks related to click-through rates, for example, involve billions of sparse features. Thus, the question of how to mine, model and perform inference on such data is a challenging and interesting problem. The characteristics of high-dimensional, sparse and imbalanced data pose unique challenges to the adoption of deep learning, and requires the community to re-assess the traditional methodologies and explore novel domain-specific approaches to learn and evaluate robust and trustworthy model. This workshop will provide a venue for researchers and practicioners to discuss challenges, opportunities, and new ideas related to the application of deep learning on high-dimensional, sparse, and imbalanced data.

These challenges have been widely studied by the traditional machine learning and data mining community, and new techniques have been developed for deep learning. These include methods such as transfer learning, few-shot learning, meta-learning, active learning, data resampling, data generation and augmentation, one-class learning, domain decompositions, etc.. Through the course of this workshop, we will drill into the latest challenges and methodologies whilst reflecting on what the traditional machine learning and data mining researchers can contribute to the advancement of state-of-the-art in deep learning from high-dimensional, sparse, and imbalanced data with adverse properties. The workshop will bring together a diverse cross-section of speakers and a wide community of data mining and deep learning researchers and practitioners from academia, industry, and government.

Important Dates

Submission deadline: May 26, 2022 23:59 anywhere on earth
Acceptance notification: June 24 2022.
Workshop date: August 14, 2022
- Morning topics (8:00 am - 12:00 pm): High-dimensional and sparse data
The morning session has moved to 8am Aug 15th EDT on Zoom. The link is https://us06web.zoom.us/j/89489880544?pwd=d1BJelBWRHo4Skl0amFURXlPTStOQT09
- Afternoon topics (12:40 pm - 5:15 pm): Imbalance and deep learning

Invited Speakers

Morning session:

Weinan Zhang: Associate professor at Shanghai Jiao Tong University
Xiangyu Zhao: Assistant professor of Data Science at City University of Hong Kong (CityU)

Afternoon session:

Nitesh Chawla: Frank M. Freimann Professor of Computer Science & Engineering and Director of Lucy Family Institute for Data and Society at the University of Notre Dame
Bartosz Krawczyk: Assistant Professor of Computer Science, Virginia Commonwealth University, USA
Mohak Shah: CTO of Gauss Labs, San Fransisco, USA

Workshop Schedule

Time (EST)	Event
Morning session - Aug 15th EDT on Zoom
8:00 am - 8:30 am	Keynote 1: Deep learning for click-through rate prediction (Prof. Weinan Zhang, SJTU)
8:30 am -8:50 am	Oral1: A Brief History of Recommender Systems (Zhenhua Dong Huawei Noah’s Ark Lab)
8:50 am -9:10 am	Oral2: SGGG: Self-adaption Generative Gating Graph model for Personalized Micro-video Recommendation (Yingshui Tan Alibaba Group)
9:10 am -9:30 am	Oral3: Flattened Graph Convolutional Networks For Recommendation (Yue Xu BUPT&Tencent)
9:30 am -9:50 am	Oral4: Approximate Nearest Neighbor Search under Neural Similarity Metric for Large-Scale Recommendation (Rihan Chen Alibaba Group)
9:50 am -10:10 am	Coffee break
10:10 am -10:40 am	Keynote 2: Automated Machine Learning for Recommendations: Fundamentals and Advances (Prof. Xiangyu Zhao CityU of HK)
10:40 am -11:00 am	Oral5: IntTower: the Next Generation of Two-Tower Model for Pre-Ranking System Xiangyang Li (Peking University &Huawei Noah’s Ark Lab)
11:00 am -11:20 am	Oral6: Efficient Long Sequential User Data Modeling for Click-Through Rate Prediction (Qiwei Chen Alibaba Group)
11:20 am -11:40 am	Oral7: GPatch: Patching Graph Neural Networks for Cold-Start Recommendations (Hao Chen Tencent)
11:40 am -12:00 pm	Oral8: GReS: Graphical Cross-domain Recommendation for Supply Chain Platform (Zhiwen Jing Meituan)
Afternoon session - Aug 14th in-person
12:40 pm - 1:00 pm	Welcome and Introduction
1:00 pm - 1:40 pm	Keynote 1: Bartosz Krawczyk
1:40 pm - 2:40 pm	Paper Session (4 Paper Talks)
	Unsupervised Cross-Domain Conditional Generation via Normalizing Flows and Adversarial Training, Hari Prasanna Das
	De-biasing training data distribution using targeted data enrichment techniques, Dieu Thu Le
	Self-supervised Learning for Hyperspectral Images of Trees, Moqsadur Rahman
	DHEN: A Deep and Hierarchical Ensemble Network for Large-Scale Click-Through Rate Prediction, Buyun Zhang
2:40 pm - 3:20 pm	Break + Poster Session (9 papers)
	Ask Me What You Need: Product Retrieval using Knowledge from GPT-3, Su Young Kim
	Conditional Synthetic Data Generation for Personal Thermal Comfort Models, Hari Prasanna Das
	CNN Algorithms for Standoff Detection of Trace Explosives, Eric Yao
	Towards an Efficient ML System: Unveiling a Trade-off between Task Accuracy and Engineering Efficiency in a Large-scale Car Sharing Platform, Kyung Ho Park
	DynInt: Dynamic Interaction Modeling for Large-scale Click-Through Rate Prediction, Yachen Yan
	Unsupervised Cross-Domain Conditional Generation via Normalizing Flows and Adversarial Training, Hari Prasanna Das
	De-biasing training data distribution using targeted data enrichment techniques, Dieu Thu Le
	Self-supervised Learning for Hyperspectral Images of Trees, Moqsadur Rahman
	DHEN: A Deep and Hierarchical Ensemble Network for Large-Scale Click-Through Rate Prediction, Buyun Zhang
3:20 pm - 4:00 pm	Keynote 2: Mohak Shah
4:00 pm - 4:40 pm	Keynote 3: Nitesh Chawla
4:40 pm - 5:10 pm	Panel Discussion (Round Table)
5:10 pm - 5:15 pm	Closing

Topics of Interest

The topics of interest include, but are not limited to, the following:

Challenges and Risks of deep learning from high-dimensional, sparse, and imbalanced data
Large scale user response prediction modeling
Representation learning for high-dimensional, sparse, and imbalanced data
Multi-domain generalization through few-shot learning and zero-shot learning
Embedding techniques, manifold learning and dictionary learning
Scalable, distributed and parallel training system for deep learning
High-throughput and low-latency real-time serving systems
Applications of transfer learning and meta-learning for high-dimensional, sparse, and imbalanced data
Understanding user behavior
Large-scale recommendation and retrieval systems
Model compression for industrial applications
Auto-machine learning, auto-feature selection
Explainable deep learning for high-dimensional, sparse, and imbalanced data
Data augmentation and anomaly detection for high-dimensional, sparse, and imbalanced data
Generative Adversarial Networks for high-dimensional, sparse, and imbalanced data
Leveraging insights from traditional machine learning and data mining approaches for deep learning with high-dimensional, sparse, and imbalanced data
Moral and social issues related to the applications of models trained on high-dimensional, sparse, and imbalanced data
Other challenges encountered in real-world applications

Accepted Papers

Morning Session

SGGG: Self-adaption Generative Gating Graph model for Personalized Micro-video Recommendation Yingshui Tan (Alibaba Group); xiaofeng wang (alibaba group)*; Yuanliang Zhang (alibaba group); Zulong Chen (Alibaba); Jinxin Hu (Alibaba Group); Fei Fang (Alibaba Group)
IntTower: the Next Generation of Two-Tower Model for Pre-Ranking System Xiangyang Li (Peking University); Bo Chen (Huawei Noah’s Ark Lab)*; Huifeng Guo (Huawei Noah’s Ark Lab); Jingjie Li (Huawei Noah’s Ark Lab); Chenxu Zhu (Shanghai Jiao Tong University); Xiang Long (Beijing University of Posts and Telecommunications); Sujian Li (Peking University); Yichao Wang (Huawei Noah’s Ark Lab); Wei Guo (Huawei Noah’s Ark Lab); Longxia Mao (Huawei Technologies Co Ltd); Jinxing Liu (Huawei Technologies Co Ltd); Zhenhua Dong (Huawei Noah’s Ark Lab); Ruiming Tang (Huawei Noah’s Ark Lab)
A Brief History of Recommender Systems Zhenhua Dong (Huawei Noah’s Ark Lab)*; Zhe Wang (Tsinghua University); Jun Xu (Renmin University of China); Ruiming Tang (Huawei Noah’s Ark Lab); Ji-Rong Wen (Renmin University of China)
Approximate Nearest Neighbor Search under Neural Similarity Metric for Large-Scale Recommendation Rihan Chen (Alibaba Group); Bin Liu (Alibaba Group); Han Zhu (Alibaba Group)*; Wang Yaoxuan (Alibaba Group); Qi Li (Alibaba); Buting Ma (Alibaba Group); qingbo hua (Alibaba); Jun Jiang (Alibaba); Yunlong Xu (Alibaba Group); Hongbo Deng (Alibaba Group); Bo Zheng (Alibaba Group)
Efficient Long Sequential User Data Modeling for Click-Through Rate Prediction Qiwei Chen (Alibaba Group)*; Yue Xu (Alibaba Group); Changhua Pei (Alibaba Group); Shanshan Lv (Alibaba Group); Tao Zhuang (Alibaba Group); Junfeng Ge (Alibaba Group)
Entire Space Learning Framework: Unbias Conversion Rate Prediction in Full Stages of Recommender System Shanshan Lv (Alibaba Group)*; Qiwei Chen (Alibaba Group); Tao Zhuang (Alibaba Group); Junfeng Ge (Alibaba Group)
A Field-wise Analysis of Task Conﬂicts in Multi-Task Learning based Recommendation Models Yichao Wang (Huawei Noah’s Ark Lab)*; Zhicheng He (Huawei Noah’s Ark Lab); Yuhuan Yang (Shanghai Jiao Tong University); JIAXIN CHEN (The Hong Kong Polytechnic University); Bo Chen (Huawei Noah’s Ark Lab); Zhirong Liu (Huawei Noah’s Ark Lab); Ruiming Tang (Huawei Noah’s Ark Lab)
Flattened Graph Convolutional Networks For Recommendation Yue Xu (Beijing University of Posts and Telecommunications)*; Hao Chen (Tencent); Zengde Deng (Cainiao Network); Yuanchen Bei (Zhejiang University); Feiran Huang (Jinan University)
GPatch: Patching Graph Neural Networks for Cold-Start Recommendations Hao Chen (Tencent)*; Zefan Wang (Jinan University); Yue Xu (Beijing University of Posts and Telecommunications); Xiao Huang (The Hong Kong Polytechnic University); Feiran Huang (Jinan University)
Deep Position-wise Curve Network for Online Allocation in Sponsored Search Fei Xiong (Alibaba Group)*; Zulong Chen (Alibaba); Mingyuan Tao (Alibaba Group); Liangyue Li (Alibaba Group); Shoudi Hao (Aalibaba Group)
GReS: Graphical Cross-domain Recommendation for Supply Chain Platform Zhiwen Jing (Taiyuan University of Technology)*; Yang Feng (Meituan); Xiaochen Ma (Meituan); Nan Wu (Meituan); Shengqiao Kang (Meituan); Hao Guo (Taiyuan University of Technology)
Learning to Prerank with Feedback Cascading for Online Advertising Zhishan Zhao (Alibaba Group); Yu Zhang (Alibaba Group); Shu-Guang Han (Alibaba Group)*; Han Zhu (Alibaba Group); Hongbo Deng (Alibaba Group); Bo Zheng (Alibaba Group)

Afternoon Session

Paper 4: Unsupervised Cross-Domain Conditional Generation via Normalizing Flows and Adversarial Training Paper. Hari Prasanna Das, UC Berkeley; Ryan Tran, UC Berkeley; Japjot Singh, UC Berkeley; Yu Wen Lin, UC Berkeley; Costas J. Spanos, UC Berkeley.
Paper 5: Conditional Synthetic Data Generation for Personal Thermal Comfort Models Paper. Hari Prasanna Das, UC Berkeley; Costas J. Spanos, UC Berkeley.
Paper 7: CNN Algorithms for Standoff Detection of Trace Explosives Paper. Eric Yao, Naval Research Laboratory.
Paper 12: De-biasing training data distribution using targeted data enrichment techniques Paper.pdf). Dieu Thu Le, Amazon; Jose Garrido Ramas, Alexa AI; Yulia Grishina, Amazon; Kay Rottmann, Alexa AI.
Paper 13: Self-supervised Learning for Hyperspectral Images of Trees Paper. Moqsadur Rahman, University of Texas at El Paso; Saurav Kumar, Arizona State University; Santosh Subhash Palmate, Texas A&M University; M. Shahriar Hossain, University of Texas at El Paso.
Paper 15: Ask Me What You Need: Product Retrieval using Knowledge from GPT-3 Paper. Su Young Kim, Clova AI Research, NAVER Corp.; Hyeonjin Park, Korea university; Kyuyong Shin, NAVER AI Lab, NAVER Corp.; Kyung-Min Kim, Clova AI Research, NAVER Corp..
Paper 22: Towards an Efficient ML System: Unveiling a Trade-off between Task Accuracy and Engineering Efficiency in a Large-scale Car Sharing Platform Paper. Kyung Ho Park, SOCAR AI Research; Hyunhee Chung, SOCAR; Soonwoo Kwon, Socar.
Paper 23: DynInt: Dynamic Interaction Modeling for Large-scale Click-Through Rate Prediction Paper. YACHEN YAN, Credit Karma; Liubo Li, Credit Karma.
Paper 26: DHEN: A Deep and Hierarchical Ensemble Network for Large-Scale Click-Through Rate Prediction Paper. Buyun Zhang, Meta; Liang Luo, University of Washington; Xi Liu, Meta Platforms, Inc.; Jay Li, Meta; Zeliang Chen, Facebook, Inc.; Weilin Zhang, Meta Platforms, Inc.; Xiaohan Wei, Facebook; Yuchen Hao, Meta; Michael Y Tsang, Meta Platforms, Inc.; Wenjun Wang, Meta; Yang Liu, Meta Platform Inc.; Mengyue Hang, Meta; Renqin Cai, Meta; Chaofei Yang, Meta.; Yiqun Liu, Facebook; Sihan Zeng, Meta; Rui Zhang, Meta; Xiaocong Du, Meta; Huayu Li, Meta; Yasmine Badr, Meta Platforms; Jongsoo Park, Facebook, Inc.; Jiyan Yang, Facebook Inc.; Dheevatsa Mudigere, Facebook; Ellie Wen, Facebook.

Key Dates and Author Instructions

Please format your papers using the standard KDD 2022 style files. Submissions must be in PDF format and formatted according to the new Standard ACM Conference Proceedings Template.

In addition to full-length papers (up to 9 pages) describing clear research advances, we encourage the submission of short papers (2-4 pages) that discuss work in progress, new challenges and limitations, and future directions for representations learning to overcome limited and adverse data, along with socially relevant problems, ethical AI and AI safety.

Submissions should be anonymized.

Submission System

https://cmt3.research.microsoft.com/DLPKDD2022/

Selection Criteria

All submissions will undergo peer review by the workshop’s program committee. Accepted papers will be chosen based on technical merit, empirical validation, novelty, and suitability to the workshop’s goals.

The workshop aims to provide an engaging platform for dialog that will push the state-of-the-art in deep learning from high-dimensional, sparse, and imbalanced data. To this end, selected papers will include long papers, short works-in-progress, novel topics and future directions. Work that has already appeared or is scheduled to appear in a journal, workshop, or conference (including KDD 2022) must be significantly extended to be eligible for workshop submission. Work that is currently under review at another venue may be submitted.

If you have any questions about submissions or our workshop, please contact dlpkdd2022@hotmail.com