Counterfactual Reasoning Using Predicted Latent Personality Dimensions for Optimizing Persuasion Outcome

April 21, 2024 · Declared Dead · 🏛 International Conference on Persuasive Technology

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Donghuo Zeng, Roberto S. Legaspi, Yuewen Sun, Xinshuai Dong, Kazushi Ikeda, Peter Spirtes, kun Zhang arXiv ID 2404.13792 Category cs.MM: Multimedia Cross-listed cs.AI, cs.CL, cs.HC Citations 4 Venue International Conference on Persuasive Technology Last Checked 3 months ago

Abstract

Customizing persuasive conversations related to the outcome of interest for specific users achieves better persuasion results. However, existing persuasive conversation systems rely on persuasive strategies and encounter challenges in dynamically adjusting dialogues to suit the evolving states of individual users during interactions. This limitation restricts the system's ability to deliver flexible or dynamic conversations and achieve suboptimal persuasion outcomes. In this paper, we present a novel approach that tracks a user's latent personality dimensions (LPDs) during ongoing persuasion conversation and generates tailored counterfactual utterances based on these LPDs to optimize the overall persuasion outcome. In particular, our proposed method leverages a Bi-directional Generative Adversarial Network (BiCoGAN) in tandem with a Dialogue-based Personality Prediction Regression (DPPR) model to generate counterfactual data. This enables the system to formulate alternative persuasive utterances that are more suited to the user. Subsequently, we utilize the D3QN model to learn policies for optimized selection of system utterances on counterfactual data. Experimental results we obtained from using the PersuasionForGood dataset demonstrate the superiority of our approach over the existing method, BiCoGAN. The cumulative rewards and Q-values produced by our method surpass ground truth benchmarks, showcasing the efficacy of employing counterfactual reasoning and LPDs to optimize reinforcement learning policy in online interactions.