To Softmax, or not to Softmax: that is the question when applying Active Learning for Transformer Models

October 06, 2022 ยท Declared Dead ยท ๐Ÿ› Symposium on Advances in Databases and Information Systems

๐Ÿ‘ป CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Julius Gonsior, Christian Falkenberg, Silvio Magino, Anja Reusch, Maik Thiele, Wolfgang Lehner arXiv ID 2210.03005 Category cs.LG: Machine Learning Cross-listed cs.AI, cs.CL, cs.DB Citations 7 Venue Symposium on Advances in Databases and Information Systems Last Checked 4 months ago
Abstract
Despite achieving state-of-the-art results in nearly all Natural Language Processing applications, fine-tuning Transformer-based language models still requires a significant amount of labeled data to work. A well known technique to reduce the amount of human effort in acquiring a labeled dataset is \textit{Active Learning} (AL): an iterative process in which only the minimal amount of samples is labeled. AL strategies require access to a quantified confidence measure of the model predictions. A common choice is the softmax activation function for the final layer. As the softmax function provides misleading probabilities, this paper compares eight alternatives on seven datasets. Our almost paradoxical finding is that most of the methods are too good at identifying the true most uncertain samples (outliers), and that labeling therefore exclusively outliers results in worse performance. As a heuristic we propose to systematically ignore samples, which results in improvements of various methods compared to the softmax function.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Machine Learning

Died the same way โ€” ๐Ÿ‘ป Ghosted