See What You See: Self-supervised Cross-modal Retrieval of Visual Stimuli from Brain Activity
August 07, 2022 Β· Declared Dead Β· π arXiv.org
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Zesheng Ye, Lina Yao, Yu Zhang, Sylvia Gustin
arXiv ID
2208.03666
Category
cs.MM: Multimedia
Cross-listed
cs.CV,
cs.HC
Citations
9
Venue
arXiv.org
Last Checked
3 months ago
Abstract
Recent studies demonstrate the use of a two-stage supervised framework to generate images that depict human perception to visual stimuli from EEG, referring to EEG-visual reconstruction. They are, however, unable to reproduce the exact visual stimulus, since it is the human-specified annotation of images, not their data, that determines what the synthesized images are. Moreover, synthesized images often suffer from noisy EEG encodings and unstable training of generative models, making them hard to recognize. Instead, we present a single-stage EEG-visual retrieval paradigm where data of two modalities are correlated, as opposed to their annotations, allowing us to recover the exact visual stimulus for an EEG clip. We maximize the mutual information between the EEG encoding and associated visual stimulus through optimization of a contrastive self-supervised objective, leading to two additional benefits. One, it enables EEG encodings to handle visual classes beyond seen ones during training, since learning is not directed at class annotations. In addition, the model is no longer required to generate every detail of the visual stimulus, but rather focuses on cross-modal alignment and retrieves images at the instance level, ensuring distinguishable model output. Empirical studies are conducted on the largest single-subject EEG dataset that measures brain activities evoked by image stimuli. We demonstrate the proposed approach completes an instance-level EEG-visual retrieval task which existing methods cannot. We also examine the implications of a range of EEG and visual encoder structures. Furthermore, for a mostly studied semantic-level EEG-visual classification task, despite not using class annotations, the proposed method outperforms state-of-the-art supervised EEG-visual reconstruction approaches, particularly on the capability of open class recognition.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Multimedia
π
π
Old Age
R.I.P.
π»
Ghosted
Viewport-Adaptive Navigable 360-Degree Video Delivery
π
π
The Cartographer
A Comprehensive Survey on Cross-modal Retrieval
π
π
The Cartographer
An Overview of Cross-media Retrieval: Concepts, Methodologies, Benchmarks and Challenges
R.I.P.
π»
Ghosted
A Convolutional Neural Network Approach for Post-Processing in HEVC Intra Coding
R.I.P.
π»
Ghosted
Video Generation From Text
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted