A Tweet-based Dataset for Company-Level Stock Return Prediction

June 17, 2020 · Entered Twilight · 🏛 arXiv.org

"Last commit was 5.0 years ago (≥5 year threshold)"

Evidence collected by the PWNC Scanner

Repo contents: Dataset-release version, LICENSE, README.md

Authors Karolina Sowinska, Pranava Madhyastha arXiv ID 2006.09723 Category cs.CL: Computation & Language Cross-listed cs.SI, q-fin.ST Citations 5 Venue arXiv.org Repository https://github.com/ImperialNLP/stockreturnpred ⭐ 10 Last Checked 4 months ago

Abstract

Public opinion influences events, especially related to stock market movement, in which a subtle hint can influence the local outcome of the market. In this paper, we present a dataset that allows for company-level analysis of tweet based impact on one-, two-, three-, and seven-day stock returns. Our dataset consists of 862, 231 labelled instances from twitter in English, we also release a cleaned subset of 85, 176 labelled instances to the community. We also provide baselines using standard machine learning algorithms and a multi-view learning based approach that makes use of different types of features. Our dataset, scripts and models are publicly available at: https://github.com/ImperialNLP/stockreturnpred.