R.I.P.
๐ป
Ghosted
GET-Tok: A GenAI-Enriched Multimodal TikTok Dataset Documenting the 2022 Attempted Coup in Peru
February 08, 2024 ยท Entered Twilight ยท ๐ arXiv.org
Repo contents: .DS_Store, README.md, augment_metadata, collect_data, generate_token, requirements.txt, supplemental
Authors
Gabriela Pinto, Keith Burghardt, Kristina Lerman, Emilio Ferrara
arXiv ID
2402.05882
Category
cs.SI: Social & Info Networks
Cross-listed
cs.CY,
cs.HC
Citations
5
Venue
arXiv.org
Repository
https://github.com/gabbypinto/GET-Tok-Peru
โญ 5
Last Checked
3 months ago
Abstract
TikTok is one of the largest and fastest-growing social media sites in the world. TikTok features, however, such as voice transcripts, are often missing and other important features, such as OCR or video descriptions, do not exist. We introduce the Generative AI Enriched TikTok (GET-Tok) data, a pipeline for collecting TikTok videos and enriched data by augmenting the TikTok Research API with generative AI models. As a case study, we collect videos about the attempted coup in Peru initiated by its former President, Pedro Castillo, and its accompanying protests. The data includes information on 43,697 videos published from November 20, 2022 to March 1, 2023 (102 days). Generative AI augments the collected data via transcripts of TikTok videos, text descriptions of what is shown in the videos, what text is displayed within the video, and the stances expressed in the video. Overall, this pipeline will contribute to a better understanding of online discussion in a multimodal setting with applications of Generative AI, especially outlining the utility of this pipeline in non-English-language social media. Our code used to produce the pipeline is in a public Github repository: https://github.com/gabbypinto/GET-Tok-Peru.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Social & Info Networks
R.I.P.
๐ป
Ghosted
Fake News Detection on Social Media: A Data Mining Perspective
R.I.P.
๐ป
Ghosted
Natural Scales in Geographical Patterns
R.I.P.
๐ป
Ghosted
Representation Learning on Graphs: Methods and Applications
R.I.P.
๐ป
Ghosted
The COVID-19 Social Media Infodemic
R.I.P.
๐ป
Ghosted