Sitemap
A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.
Pages
Posts
Future Update List
Published:
This post lists planned future updates, all of which are directions I have shared or explored in my former research (in Chinese or English):
- Matrix Analysis Notes
- Intro to Diffusion Model
- Intro to MCMC algorithm
- Intro to Matrix Perturbation Theory
Welcome to my homepage!
Published:
I’ve created a personal page today where I will update information about my academic and other activities in the future.
publications
AutoS2AE: Automate to Regularize Sparse Shallow Autoencoders for Recommendation.
Published in The ACM Web Conference 2023 (WWW2023), Pages 1032–1042, 2023
The Embarrassingly Shallow Autoencoders (EASE and SLIM) are strong recommendation methods based on implicit feedback, compared to competing methods like iALS and VAE-CF. However, EASE suffers from several major shortcomings. First, the training and inference of EASE can not scale with the increasing number of items since it requires storing and inverting a large dense matrix; Second, though its optimization objective – the square loss– can yield a closed-form solution, it is not consistent with recommendation goal – predicting a personalized ranking on a set of items, so that its performance is far from optimal w.r.t ranking-oriented recommendation metrics. Finally, the regularization coefficients are sensitive w.r.t recommendation accuracy and vary a lot across different datasets, so the fine-tuning of these parameters is important yet time-consuming. To improve training and inference efficiency, we propose a Similarity-Structure Aware Shallow Autoencoder on top of three similarity structures, including Co-Occurrence, KNN and NSW. We then optimize the model with a weighted square loss, which is proven effective for ranking-based recommendation but still capable of deriving closed-form solutions. However, the weight in the loss can not be learned in the training set and is similarly sensitive w.r.t the accuracy to regularization coefficients. To automatically tune the hyperparameters, we design two validation losses on the validation set for guidance, and update the hyperparameters with the gradient of the validation losses. We finally evaluate the proposed method on multiple real-world datasets and show that it outperforms seven competing baselines remarkably, and verify the effectiveness of each part in the proposed method.
Recommended citation: Rui Fan, Yuanhao Pu, Jin Chen, Zhihao Zhu, Defu Lian* and Enhong Chen. AutoS2AE: Automate to Regularize Sparse Shallow Autoencoders for Recommendation. The 32nd Web Conference (WWW 2023), pp. 1032-1042, Apr. 2023.
Invariant Representation Learning via Decoupling Style and Spurious Features.
Published in Machine Learning Journal (MLJ), 2023
This paper considers the out-of-distribution (OOD) generalization problem under the setting that both style distribution shift and spurious features exist and domain labels are missing. This setting frequently arises in real-world applications and is underlooked because previous approaches mainly handle either of these two factors. The critical challenge is decoupling style and spurious features in the absence of domain labels. To address this challenge, we first propose a structural causal model (SCM) for the image generation process, which captures both style distribution shift and spurious features. The proposed SCM enables us to design a new framework called IRSS, which can gradually separate style distribution and spurious features from images by introducing adversarial neural networks and multi-environment optimization, thus achieving OOD generalization. Moreover, it does not require additional supervision (e.g., domain labels) other than the images and their corresponding labels. Experiments on benchmark datasets demonstrate that IRSS outperforms traditional OOD methods and solves the problem of Invariant risk minimization (IRM) degradation, enabling the extraction of invariant features under distribution shift.
Recommended citation: Ruimeng Li, Yuanhao Pu, Zhaoyi Li, Chenwang Wu, Hong Xie*, Defu Lian. Invariant Representation Learning via Decoupling Style and Spurious Features. Machine Learning Journal (MLJ), accepted, 2024.
When Large Language Models Meet Personalization: Perspectives of Challenges and Opportunities.
Published in World Wide Web Journal (WWWJ), 2023
The advent of large language models marks a revolutionary breakthrough in artificial intelligence. With the unprecedented scale of training and model parameters, the capability of large language models has been dramatically improved, leading to human-like performances in understanding, language synthesizing, and common-sense reasoning, etc. Such a major leap-forward in general AI capacity will fundamentally change the pattern of how personalization is conducted. For one thing, it will reform the way of interaction between humans and personalization systems. Instead of being a passive medium of information filtering, like conventional recommender systems and search engines, large language models present the foundation for active user engagement. On top of such a new foundation, user’s requests can be proactively explored, and user’s required information can be delivered in a natural, interactable, and explainable way. For another thing, it will also considerably expand the scope of personalization, making it grow from the sole function of collecting personalized information to the compound function of providing personalized services. By leveraging large language models as a general-purpose interface, the personalization systems may compile user’s requests into plans, calls the functions of external tools (e.g., search engines, calculators, service APIs, etc.) to execute the plans, and integrate the tools’ outputs to complete the end-to-end personalization tasks. Today, large language models are still being rapidly developed, whereas the application in personalization is largely unexplored. Therefore, we consider it to be right the time to review the challenges in personalization and the opportunities to address them with large language models. In particular, we dedicate this perspective paper to the discussion of the following aspects: the development and challenges for the existing personalization system, the newly emerged capabilities of large language models, and the potential ways of making use of large language models for personalization.
Recommended citation: Jin Chen, Zheng Liu, Xu Huang, Chenwang Wu, Qi Liu, Gangwei Jiang, Yuanhao Pu, Yuxuan Lei, Xiaolong Chen, Xingmei Wang, Defu Lian, Enhong Chen. When Large Language Models Meet Personalization: Perspectives of Challenges and Opportunities. World Wide Web(WWWJ) 27, 42 (2024).
Automated Sparse and Low-Rank Shallow Autoencoders for Recommendation
Published in ACM Transactions on Recommender Systems (TORS), accepted, 2024
Collaborative filtering (CF) works have demonstrated the robust capabilities of Shallow Autoencoders on implicit feedback, showcasing highly competitive performance with other reasonable approaches (e.g., iALS and VAE-CF). However, despite their dual advantages of high performance and simple construction, EASE still exhibits several major shortcomings that must be addressed. To be more precise, the scalability of EASE is limited by the number of items, which determines the storage and inversion cost of a large dense matrix; the square-loss optimization objective does not consistently meet the recommendation task’s requirement for predicting personalized rankings, resulting in sub-optimal outcomes; the regularization coefficients are sensitive and require re-calibration with different datasets, leading to an exhaustive and time-consuming fine-tuning process. In order to address these obstacles, we propose a novel approach called Similarity-Structure Aware Shallow Autoencoder (AutoS2AE) that aims to enhance both recommendation accuracy and model efficiency. Our method introduces three similarity structures: Co-Occurrence, KNN, and NSW graphs, which replace the large dense matrix in EASE with a sparse structure, thus facilitating model compression. Additionally, we optimize the model by incorporating a low-rank training component into the matrix and applying a weighted square loss for improved ranking-oriented approximations. To automatically tune the hyperparameters, we further design two validation losses on the validation set for guidance and update the hyperparameters using the gradients of these validation losses. Both theoretical analyses regarding the introduction of similarity structures and empirical evaluations on multiple real-world datasets demonstrate the effectiveness of our proposed method, which significantly outperforms competing baselines.
Recommended citation: Yuanhao Pu, Rui Fan, Jin Chen, Zhihao Zhu, Defu Lian*, Enhong Chen. Automated Sparse and Low-Rank Shallow Autoencoders for Recommendation. ACM Transactions on Recommender Systems (TORS), accepted, 2024.'
Learning-Efficient Yet Generalizable Collaborative Filtering for Item Recommendation
Published in The 41st International Conference on Machine Learning (ICML2024), accepted, 2024
The weighted squared loss is a common component in several Collaborative Filtering (CF) algorithms for item recommendation, including the representative implicit Alternating Least Squares (iALS). Despite its widespread use, this loss function lacks a clear connection to ranking objectives such as Discounted Cumulative Gain (DCG), posing a fundamental challenge in explaining the exceptional ranking performance observed in these algorithms. In this work, we make a breakthrough by establishing a connection between squared loss and ranking metrics through a Taylor expansion of the DCG-consistent surrogate loss—softmax loss. We also discover a new surrogate squared loss function, namely Ranking-Generalizable Squared (RG$^2$) loss, and conduct thorough theoretical analyses on the DCG-consistency of the proposed loss function. Later, we present an example of utilizing the RG$^2$ loss with Matrix Factorization (MF), coupled with a generalization upper bound and an ALS optimization algorithm that leverages closed-form solutions over all items. Experimental results over three public datasets demonstrate the effectiveness of the RG$^2$ loss, exhibiting ranking performance on par with, or even surpassing, the softmax loss while achieving faster convergence.
Recommended citation: Yuanhao Pu, Xiaolong Chen, Xu Huang, Jin Chen, Defu Lian, Enhong Chen. Learning-Efficient Yet Generalizable Collaborative Filtering for Item Recommendation. Proceedings of the 41st International Conference on Machine Learning, PMLR 235:41183-41203, 2024.
Understanding the Effect of Loss Functions on the Generalization of Recommendations
Published in The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2025), accepted, 2024
The two-tower model has become prevalent in recommender systems for its computational efficiency and robust predictive capabilities. The model usually employs two independent neural networks to encode user and item data separately, and predicts the similarity score with inner product or cosine functions, depending on which the Top-k ranked item list is generated. The optimization process typically involves a multi-label classification objective, often guided by surrogate loss functions like Softmax and One-vs-All (OvA), to enhance the recommendation performance. Despite both Softmax and OvA losses being Bayes-consistent, empirical observations reveal a significant performance gap in evaluation metrics, suggesting limitations in Bayes-consistency for analyzing loss effectiveness. To address this, we introduce H-consistency into the discussion, which provides non-asymptotic and hypothesis-specific guarantees for Top-k classification within the two-tower model’s hypothesis space. Through theoretical analysis, we demonstrate that Softmax and Cosine Contrastive Loss exhibit H-consistency, while the OvA loss does not, explaining the observed performance discrepancies. Our findings bridge the gap between theoretical properties and practical outcomes, offering deeper insights into the optimization of two-tower models and contributing to the development of more effective recommendation systems.
Recommended citation: Yuanhao Pu, Defu Lian, Xiaolong Chen, Jin Chen, Ze Liu, Enhong Chen. Understanding the Effect of Loss Functions on the Generalization of Recommendations. Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2025), accepted