Understanding the Effect of Loss Functions on the Generalization of Recommendations
Published in The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2025), accepted, 2024
The two-tower model has become prevalent in recommender systems for its computational efficiency and robust predictive capabilities. The model usually employs two independent neural networks to encode user and item data separately, and predicts the similarity score with inner product or cosine functions, depending on which the Top-k ranked item list is generated. The optimization process typically involves a multi-label classification objective, often guided by surrogate loss functions like Softmax and One-vs-All (OvA), to enhance the recommendation performance. Despite both Softmax and OvA losses being Bayes-consistent, empirical observations reveal a significant performance gap in evaluation metrics, suggesting limitations in Bayes-consistency for analyzing loss effectiveness. To address this, we introduce H-consistency into the discussion, which provides non-asymptotic and hypothesis-specific guarantees for Top-k classification within the two-tower model’s hypothesis space. Through theoretical analysis, we demonstrate that Softmax and Cosine Contrastive Loss exhibit H-consistency, while the OvA loss does not, explaining the observed performance discrepancies. Our findings bridge the gap between theoretical properties and practical outcomes, offering deeper insights into the optimization of two-tower models and contributing to the development of more effective recommendation systems.
Recommended citation: Yuanhao Pu, Defu Lian, Xiaolong Chen, Jin Chen, Ze Liu, Enhong Chen. Understanding the Effect of Loss Functions on the Generalization of Recommendations. Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2025), accepted