Chain-of-Choice Hierarchical Policy Learning for Conversational Recommendation

October 27, 2023 · Declared Dead · 🏛 International Conference on Database Systems for Advanced Applications

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Wei Fan, Weijia Zhang, Weiqi Wang, Yangqiu Song, Hao Liu arXiv ID 2310.17922 Category cs.IR: Information Retrieval Citations 2 Venue International Conference on Database Systems for Advanced Applications Last Checked 4 months ago

Abstract

Conversational Recommender Systems (CRS) illuminate user preferences via multi-round interactive dialogues, ultimately navigating towards precise and satisfactory recommendations. However, contemporary CRS are limited to inquiring binary or multi-choice questions based on a single attribute type (e.g., color) per round, which causes excessive rounds of interaction and diminishes the user's experience. To address this, we propose a more realistic and efficient conversational recommendation problem setting, called Multi-Type-Attribute Multi-round Conversational Recommendation (MTAMCR), which enables CRS to inquire about multi-choice questions covering multiple types of attributes in each round, thereby improving interactive efficiency. Moreover, by formulating MTAMCR as a hierarchical reinforcement learning task, we propose a Chain-of-Choice Hierarchical Policy Learning (CoCHPL) framework to enhance both the questioning efficiency and recommendation effectiveness in MTAMCR. Specifically, a long-term policy over options (i.e., ask or recommend) determines the action type, while two short-term intra-option policies sequentially generate the chain of attributes or items through multi-step reasoning and selection, optimizing the diversity and interdependence of questioning attributes. Finally, extensive experiments on four benchmarks demonstrate the superior performance of CoCHPL over prevailing state-of-the-art methods.