DLP-KDD 2022

4th Workshop on Deep Learning Practice and Theory for High-Dimensional Sparse and Imbalanced Data with KDD 2022

DLP-KDD2022 Introduction

In the increasingly digitized world, applications in a wide varity of domains are shifting to harness the ability to process, understand, and exploit data collected from different sources. Deep learning-based methods, in particularly, have recently empowered many applications to leveraging their data. This incluses examples from customer-centric applications, such as personalized recommendations, online advertising, search engines and interest/intention modeling from customers’ behavior. In these applications, leveraging deep learning tools can significantly enhance the user’s experience while increasing revenues. Data generated in customer-centric applications and other critical real-world domains such as health and medicine, biology, business, industrial engineering, etc are often high-dimensional, sparse and imbalanced. These adverse data properties challenge the application of deep learning in real-world applications due to the fact that they can cause poor model performance, failed projects, and potentially serious social implications.

The complexities explored here are different from many traditional deep learning applications, such as image classification and speech recorgnition, which have rich, dense datasets for model development and testing. Typical prediction tasks related to click-through rates, for example, involve billions of sparse features. Thus, the question of how to mine, model and perform inference on such data is a challenging and interesting problem. The characteristics of high-dimensional, sparse and imbalanced data pose unique challenges to the adoption of deep learning, and requires the community to re-assess the traditional methodologies and explore novel domain-specific approaches to learn and evaluate robust and trustworthy model. This workshop will provide a venue for researchers and practicioners to discuss challenges, opportunities, and new ideas related to the application of deep learning on high-dimensional, sparse, and imbalanced data.

These challenges have been widely studied by the traditional machine learning and data mining community, and new techniques have been developed for deep learning. These include methods such as transfer learning, few-shot learning, meta-learning, active learning, data resampling, data generation and augmentation, one-class learning, domain decompositions, etc.. Through the course of this workshop, we will drill into the latest challenges and methodologies whilst reflecting on what the traditional machine learning and data mining researchers can contribute to the advancement of state-of-the-art in deep learning from high-dimensional, sparse, and imbalanced data with adverse properties. The workshop will bring together a diverse cross-section of speakers and a wide community of data mining and deep learning researchers and practitioners from academia, industry, and government.

Important Dates

  • Submission deadline: May 26, 2022 23:59 anywhere on earth
  • Acceptance notification: June 24 2022.
  • Workshop date: August 14, 2022

    • Morning topics (8:00 am - 12:00 pm): High-dimensional and sparse data

    The morning session has moved to 8am Aug 15th EDT on Zoom. The link is https://us06web.zoom.us/j/89489880544?pwd=d1BJelBWRHo4Skl0amFURXlPTStOQT09

    • Afternoon topics (12:40 pm - 5:15 pm): Imbalance and deep learning

Invited Speakers

Morning session:

  • Weinan Zhang: Associate professor at Shanghai Jiao Tong University

  • Xiangyu Zhao: Assistant professor of Data Science at City University of Hong Kong (CityU)

Afternoon session:

  • Nitesh Chawla: Frank M. Freimann Professor of Computer Science & Engineering and Director of Lucy Family Institute for Data and Society at the University of Notre Dame

  • Bartosz Krawczyk: Assistant Professor of Computer Science, Virginia Commonwealth University, USA

  • Mohak Shah: CTO of Gauss Labs, San Fransisco, USA

Workshop Schedule

Time (EST) Event
Morning session - Aug 15th EDT on Zoom  
8:00 am - 8:30 am Keynote 1: Deep learning for click-through rate prediction (Prof. Weinan Zhang, SJTU)
8:30 am -8:50 am Oral1: A Brief History of Recommender Systems (Zhenhua Dong Huawei Noah’s Ark Lab)
8:50 am -9:10 am Oral2: SGGG: Self-adaption Generative Gating Graph model for Personalized Micro-video Recommendation (Yingshui Tan Alibaba Group)
9:10 am -9:30 am Oral3: Flattened Graph Convolutional Networks For Recommendation (Yue Xu BUPT&Tencent)
9:30 am -9:50 am Oral4: Approximate Nearest Neighbor Search under Neural Similarity Metric for Large-Scale Recommendation (Rihan Chen Alibaba Group)
9:50 am -10:10 am Coffee break
10:10 am -10:40 am Keynote 2: Automated Machine Learning for Recommendations: Fundamentals and Advances (Prof. Xiangyu Zhao CityU of HK)
10:40 am -11:00 am Oral5: IntTower: the Next Generation of Two-Tower Model for Pre-Ranking System Xiangyang Li (Peking University &Huawei Noah’s Ark Lab)
11:00 am -11:20 am Oral6: Efficient Long Sequential User Data Modeling for Click-Through Rate Prediction (Qiwei Chen Alibaba Group)
11:20 am -11:40 am Oral7: GPatch: Patching Graph Neural Networks for Cold-Start Recommendations (Hao Chen Tencent)
11:40 am -12:00 pm Oral8: GReS: Graphical Cross-domain Recommendation for Supply Chain Platform (Zhiwen Jing Meituan)
Afternoon session - Aug 14th in-person  
12:40 pm - 1:00 pm Welcome and Introduction
1:00 pm - 1:40 pm Keynote 1: Bartosz Krawczyk
1:40 pm - 2:40 pm Paper Session (4 Paper Talks)
  Unsupervised Cross-Domain Conditional Generation via Normalizing Flows and Adversarial Training, Hari Prasanna Das
  De-biasing training data distribution using targeted data enrichment techniques, Dieu Thu Le
  Self-supervised Learning for Hyperspectral Images of Trees, Moqsadur Rahman
  DHEN: A Deep and Hierarchical Ensemble Network for Large-Scale Click-Through Rate Prediction, Buyun Zhang
2:40 pm - 3:20 pm Break + Poster Session (9 papers)
  Ask Me What You Need: Product Retrieval using Knowledge from GPT-3, Su Young Kim
  Conditional Synthetic Data Generation for Personal Thermal Comfort Models, Hari Prasanna Das
  CNN Algorithms for Standoff Detection of Trace Explosives, Eric Yao
  Towards an Efficient ML System: Unveiling a Trade-off between Task Accuracy and Engineering Efficiency in a Large-scale Car Sharing Platform, Kyung Ho Park
  DynInt: Dynamic Interaction Modeling for Large-scale Click-Through Rate Prediction, Yachen Yan
  Unsupervised Cross-Domain Conditional Generation via Normalizing Flows and Adversarial Training, Hari Prasanna Das
  De-biasing training data distribution using targeted data enrichment techniques, Dieu Thu Le
  Self-supervised Learning for Hyperspectral Images of Trees, Moqsadur Rahman
  DHEN: A Deep and Hierarchical Ensemble Network for Large-Scale Click-Through Rate Prediction, Buyun Zhang
3:20 pm - 4:00 pm Keynote 2: Mohak Shah
4:00 pm - 4:40 pm Keynote 3: Nitesh Chawla
4:40 pm - 5:10 pm Panel Discussion (Round Table)
5:10 pm - 5:15 pm Closing

Topics of Interest

The topics of interest include, but are not limited to, the following:

  • Challenges and Risks of deep learning from high-dimensional, sparse, and imbalanced data
  • Large scale user response prediction modeling
  • Representation learning for high-dimensional, sparse, and imbalanced data
  • Multi-domain generalization through few-shot learning and zero-shot learning
  • Embedding techniques, manifold learning and dictionary learning
  • Scalable, distributed and parallel training system for deep learning
  • High-throughput and low-latency real-time serving systems
  • Applications of transfer learning and meta-learning for high-dimensional, sparse, and imbalanced data
  • Understanding user behavior
  • Large-scale recommendation and retrieval systems
  • Model compression for industrial applications
  • Auto-machine learning, auto-feature selection
  • Explainable deep learning for high-dimensional, sparse, and imbalanced data
  • Data augmentation and anomaly detection for high-dimensional, sparse, and imbalanced data
  • Generative Adversarial Networks for high-dimensional, sparse, and imbalanced data
  • Leveraging insights from traditional machine learning and data mining approaches for deep learning with high-dimensional, sparse, and imbalanced data
  • Moral and social issues related to the applications of models trained on high-dimensional, sparse, and imbalanced data
  • Other challenges encountered in real-world applications

Accepted Papers

Morning Session

  • SGGG: Self-adaption Generative Gating Graph model for Personalized Micro-video Recommendation Yingshui Tan (Alibaba Group); xiaofeng wang (alibaba group)*; Yuanliang Zhang (alibaba group); Zulong Chen (Alibaba); Jinxin Hu (Alibaba Group); Fei Fang (Alibaba Group)
  • IntTower: the Next Generation of Two-Tower Model for Pre-Ranking System Xiangyang Li (Peking University); Bo Chen (Huawei Noah’s Ark Lab)*; Huifeng Guo (Huawei Noah’s Ark Lab); Jingjie Li (Huawei Noah’s Ark Lab); Chenxu Zhu (Shanghai Jiao Tong University); Xiang Long (Beijing University of Posts and Telecommunications); Sujian Li (Peking University); Yichao Wang (Huawei Noah’s Ark Lab); Wei Guo (Huawei Noah’s Ark Lab); Longxia Mao (Huawei Technologies Co Ltd); Jinxing Liu (Huawei Technologies Co Ltd); Zhenhua Dong (Huawei Noah’s Ark Lab); Ruiming Tang (Huawei Noah’s Ark Lab)
  • A Brief History of Recommender Systems Zhenhua Dong (Huawei Noah’s Ark Lab)*; Zhe Wang (Tsinghua University); Jun Xu (Renmin University of China); Ruiming Tang (Huawei Noah’s Ark Lab); Ji-Rong Wen (Renmin University of China)
  • Approximate Nearest Neighbor Search under Neural Similarity Metric for Large-Scale Recommendation Rihan Chen (Alibaba Group); Bin Liu (Alibaba Group); Han Zhu (Alibaba Group)*; Wang Yaoxuan (Alibaba Group); Qi Li (Alibaba); Buting Ma (Alibaba Group); qingbo hua (Alibaba); Jun Jiang (Alibaba); Yunlong Xu (Alibaba Group); Hongbo Deng (Alibaba Group); Bo Zheng (Alibaba Group)
  • Efficient Long Sequential User Data Modeling for Click-Through Rate Prediction Qiwei Chen (Alibaba Group)*; Yue Xu (Alibaba Group); Changhua Pei (Alibaba Group); Shanshan Lv (Alibaba Group); Tao Zhuang (Alibaba Group); Junfeng Ge (Alibaba Group)
  • Entire Space Learning Framework: Unbias Conversion Rate Prediction in Full Stages of Recommender System Shanshan Lv (Alibaba Group)*; Qiwei Chen (Alibaba Group); Tao Zhuang (Alibaba Group); Junfeng Ge (Alibaba Group)
  • A Field-wise Analysis of Task Conflicts in Multi-Task Learning based Recommendation Models Yichao Wang (Huawei Noah’s Ark Lab)*; Zhicheng He (Huawei Noah’s Ark Lab); Yuhuan Yang (Shanghai Jiao Tong University); JIAXIN CHEN (The Hong Kong Polytechnic University); Bo Chen (Huawei Noah’s Ark Lab); Zhirong Liu (Huawei Noah’s Ark Lab); Ruiming Tang (Huawei Noah’s Ark Lab)
  • Flattened Graph Convolutional Networks For Recommendation Yue Xu (Beijing University of Posts and Telecommunications)*; Hao Chen (Tencent); Zengde Deng (Cainiao Network); Yuanchen Bei (Zhejiang University); Feiran Huang (Jinan University)
  • GPatch: Patching Graph Neural Networks for Cold-Start Recommendations Hao Chen (Tencent)*; Zefan Wang (Jinan University); Yue Xu (Beijing University of Posts and Telecommunications); Xiao Huang (The Hong Kong Polytechnic University); Feiran Huang (Jinan University)
  • Deep Position-wise Curve Network for Online Allocation in Sponsored Search Fei Xiong (Alibaba Group)*; Zulong Chen (Alibaba); Mingyuan Tao (Alibaba Group); Liangyue Li (Alibaba Group); Shoudi Hao (Aalibaba Group)
  • GReS: Graphical Cross-domain Recommendation for Supply Chain Platform Zhiwen Jing (Taiyuan University of Technology)*; Yang Feng (Meituan); Xiaochen Ma (Meituan); Nan Wu (Meituan); Shengqiao Kang (Meituan); Hao Guo (Taiyuan University of Technology)
  • Learning to Prerank with Feedback Cascading for Online Advertising Zhishan Zhao (Alibaba Group); Yu Zhang (Alibaba Group); Shu-Guang Han (Alibaba Group)*; Han Zhu (Alibaba Group); Hongbo Deng (Alibaba Group); Bo Zheng (Alibaba Group)

Afternoon Session

  • Paper 4: Unsupervised Cross-Domain Conditional Generation via Normalizing Flows and Adversarial Training Paper. Hari Prasanna Das, UC Berkeley; Ryan Tran, UC Berkeley; Japjot Singh, UC Berkeley; Yu Wen Lin, UC Berkeley; Costas J. Spanos, UC Berkeley.

  • Paper 5: Conditional Synthetic Data Generation for Personal Thermal Comfort Models Paper. Hari Prasanna Das, UC Berkeley; Costas J. Spanos, UC Berkeley.

  • Paper 7: CNN Algorithms for Standoff Detection of Trace Explosives Paper. Eric Yao, Naval Research Laboratory.

  • Paper 12: De-biasing training data distribution using targeted data enrichment techniques Paper.pdf). Dieu Thu Le, Amazon; Jose Garrido Ramas, Alexa AI; Yulia Grishina, Amazon; Kay Rottmann, Alexa AI.

  • Paper 13: Self-supervised Learning for Hyperspectral Images of Trees Paper. Moqsadur Rahman, University of Texas at El Paso; Saurav Kumar, Arizona State University; Santosh Subhash Palmate, Texas A&M University; M. Shahriar Hossain, University of Texas at El Paso.

  • Paper 15: Ask Me What You Need: Product Retrieval using Knowledge from GPT-3 Paper. Su Young Kim, Clova AI Research, NAVER Corp.; Hyeonjin Park, Korea university; Kyuyong Shin, NAVER AI Lab, NAVER Corp.; Kyung-Min Kim, Clova AI Research, NAVER Corp..

  • Paper 22: Towards an Efficient ML System: Unveiling a Trade-off between Task Accuracy and Engineering Efficiency in a Large-scale Car Sharing Platform Paper. Kyung Ho Park, SOCAR AI Research; Hyunhee Chung, SOCAR; Soonwoo Kwon, Socar.

  • Paper 23: DynInt: Dynamic Interaction Modeling for Large-scale Click-Through Rate Prediction Paper. YACHEN YAN, Credit Karma; Liubo Li, Credit Karma.

  • Paper 26: DHEN: A Deep and Hierarchical Ensemble Network for Large-Scale Click-Through Rate Prediction Paper. Buyun Zhang, Meta; Liang Luo, University of Washington; Xi Liu, Meta Platforms, Inc.; Jay Li, Meta; Zeliang Chen, Facebook, Inc.; Weilin Zhang, Meta Platforms, Inc.; Xiaohan Wei, Facebook; Yuchen Hao, Meta; Michael Y Tsang, Meta Platforms, Inc.; Wenjun Wang, Meta; Yang Liu, Meta Platform Inc.; Mengyue Hang, Meta; Renqin Cai, Meta; Chaofei Yang, Meta.; Yiqun Liu, Facebook; Sihan Zeng, Meta; Rui Zhang, Meta; Xiaocong Du, Meta; Huayu Li, Meta; Yasmine Badr, Meta Platforms; Jongsoo Park, Facebook, Inc.; Jiyan Yang, Facebook Inc.; Dheevatsa Mudigere, Facebook; Ellie Wen, Facebook.

Key Dates and Author Instructions

Please format your papers using the standard KDD 2022 style files. Submissions must be in PDF format and formatted according to the new Standard ACM Conference Proceedings Template.

In addition to full-length papers (up to 9 pages) describing clear research advances, we encourage the submission of short papers (2-4 pages) that discuss work in progress, new challenges and limitations, and future directions for representations learning to overcome limited and adverse data, along with socially relevant problems, ethical AI and AI safety.

Submissions should be anonymized.

Submission System

https://cmt3.research.microsoft.com/DLPKDD2022/

Selection Criteria

All submissions will undergo peer review by the workshop’s program committee. Accepted papers will be chosen based on technical merit, empirical validation, novelty, and suitability to the workshop’s goals.

The workshop aims to provide an engaging platform for dialog that will push the state-of-the-art in deep learning from high-dimensional, sparse, and imbalanced data. To this end, selected papers will include long papers, short works-in-progress, novel topics and future directions. Work that has already appeared or is scheduled to appear in a journal, workshop, or conference (including KDD 2022) must be significantly extended to be eligible for workshop submission. Work that is currently under review at another venue may be submitted.

If you have any questions about submissions or our workshop, please contact dlpkdd2022@hotmail.com

Previous Editions

Website for dlp-kdd2021 can be found here

Website for s2d-olad2021 can be found here

SIGKDD 2022 Information


See the SIGKDD 2022 website for more details on the workshop location and times, along with the full schedule and list of invited speakers.

Workshop Chairs


Roberto Corizzo
Assistant Professor
American University, Washington D.C., USA
rcorizzo@american.edu

Junfeng Ge
Senior staff algorithm engineer
Alibaba Group

Colin Bellinger
AI Researcher
National Research Council of Canada
colin.bellinger@nrc-cnrc.gc.ca

Xiaoqiang Zhu
Chief AI Officer
Mobvista Group

Paula Branco
Assistant Professor
University of Ottawa, Ottawa, Canada
pbranco@uottawa.ca

Kuang-chih Lee
Tech Lead of business intelligence group, AliExpress

Nathalie Japkowicz
Professor
American University, Washington D.C., USA
japkowic@american.edu

Ruiming Tang
Director of recommendation and search
Huawei Noah Ark Lab

Tao Zhuang
Senior staff engineer
Alibaba Group

Han Zhu
Staff Engineer
Alibaba Group

Biye Jiang
Staff Engineer
Alibaba Group

Jiaxin Mao
Assistant Professor
Renmin University of China

Weinan Zhang
Associate Professor
Shanghai Jiao Tong University

Program Committee

Abhishek Gupta, UC Berkeley

Alberto Cano, Virginia Commonwealth University

Bartosz Krawczyk, Virginia Commonwealth University

Constantine Dovrolis, Georgia Tech

Denis Gudovskiy, Panasonic

Dino Ienco, IRSTEA

Eftim Zdravevski, Faculty of Computer Science and Engineering Ss Cyril and Methodius University Skopje North Macedonia

Evan Crothers, University of Ottawa

Gianvito Pio, University of Bari

Herna Viktor, University of Ottawa

James Smith, Georgia Institute of Technology

Jeffrey Ling

Jie Gao, Rutgers University

Joao Gama, INESC TEC - LIAAD

Massimiliano Altieri, University of Bari

Michal Wozniak, Wroclaw University of Science and Technology

Michelangelo, Ceci University of Bari

Mikel Galar, Universidad Pública de Navarra

Nuno Moniz, INESC TEC & University of Porto Portugal

Rita Ribeiro, Porto Portugal

Roberto Alejo, Tecnologico de Estudios Superiores de Jocotitlan

Ronaldo Prati, UFABC

Salvador García, Universidad de Granada

Taghi Khoshgaftaar, Florida Atlantic University USA

Tailin Wu, MIT

Zois Boukouvalas, American University

Zsolt Kira, Georgia Institute of Technology

To be defined.