Short Text Classification Approach to Identify Child Sexual Exploitation Material
October 29, 2020 Β· Declared Dead Β· π Scientific Reports
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Mhd Wesam Al-Nabki, Eduardo Fidalgo, Enrique Alegre, RocΓo Alaiz-RodrΓguez
arXiv ID
2011.01113
Category
cs.IR: Information Retrieval
Cross-listed
cs.LG
Citations
5
Venue
Scientific Reports
Last Checked
4 months ago
Abstract
Producing or sharing Child Sexual Exploitation Material (CSEM) is a serious crime fought vigorously by Law Enforcement Agencies (LEAs). When an LEA seizes a computer from a potential producer or consumer of CSEM, they need to analyze the suspect's hard disk's files looking for pieces of evidence. However, a manual inspection of the file content looking for CSEM is a time-consuming task. In most cases, it is unfeasible in the amount of time available for the Spanish police using a search warrant. Instead of analyzing its content, another approach that can be used to speed up the process is to identify CSEM by analyzing the file names and their absolute paths. The main challenge for this task lies behind dealing with short text distorted deliberately by the owners of this material using obfuscated words and user-defined naming patterns. This paper presents and compares two approaches based on short text classification to identify CSEM files. The first one employs two independent supervised classifiers, one for the file name and the other for the path, and their outputs are later on fused into a single score. Conversely, the second approach uses only the file name classifier to iterate over the file's absolute path. Both approaches operate at the character n-grams level, while binary and orthographic features enrich the file name representation, and a binary Logistic Regression model is used for classification. The presented file classifier achieved an average class recall of 0.98. This solution could be integrated into forensic tools and services to support Law Enforcement Agencies to identify CSEM without tackling every file's visual content, which is computationally much more highly demanding.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Information Retrieval
R.I.P.
π»
Ghosted
π
π
Old Age
Neural Graph Collaborative Filtering
R.I.P.
π»
Ghosted
DeepFM: A Factorization-Machine based Neural Network for CTR Prediction
R.I.P.
π»
Ghosted
BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer
R.I.P.
π
404 Not Found
Graph Neural Networks for Social Recommendation
R.I.P.
π»
Ghosted
Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted