Guaranteed satisficing and finite regret: Analysis of a cognitive satisficing value function

December 14, 2018 Β· Declared Dead Β· πŸ› Biosyst.

πŸ‘» CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Akihiro Tamatsukuri, Tatsuji Takahashi arXiv ID 1812.05795 Category cs.AI: Artificial Intelligence Citations 12 Venue Biosyst. Last Checked 4 months ago
Abstract
As reinforcement learning algorithms are being applied to increasingly complicated and realistic tasks, it is becoming increasingly difficult to solve such problems within a practical time frame. Hence, we focus on a \textit{satisficing} strategy that looks for an action whose value is above the aspiration level (analogous to the break-even point), rather than the optimal action. In this paper, we introduce a simple mathematical model called risk-sensitive satisficing ($RS$) that implements a satisficing strategy by integrating risk-averse and risk-prone attitudes under the greedy policy. We apply the proposed model to the $K$-armed bandit problems, which constitute the most basic class of reinforcement learning tasks, and prove two propositions. The first is that $RS$ is guaranteed to find an action whose value is above the aspiration level. The second is that the regret (expected loss) of $RS$ is upper bounded by a finite value, given that the aspiration level is set to an "optimal level" so that satisficing implies optimizing. We confirm the results through numerical simulations and compare the performance of $RS$ with that of other representative algorithms for the $K$-armed bandit problems.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Artificial Intelligence

Died the same way β€” πŸ‘» Ghosted