Data | PsyCoMark

Dataset Description

The PsyCoMark dataset is built from Reddit submission statements — user-written summaries accompanying media submissions. These are rich in narrative content and ideal for identifying psycholinguistic markers.

Annotation

Comments are annotated with binary conspiracy labels (conspiracy, not conspiracy, can't tell) and span-level markers based on psychological mechanisms of conspiracy belief. Annotation was performed by native English speakers via Prolific and underwent rigorous quality control.

Statistics

Trial Data (Development Phase)

~1,000 Reddit submission statements
Classes: 630 conspiracy, 655 not conspiracy, 361 can’t tell
Marker annotations:
- Actor: 1276
- Action: 1239
- Effect: 1009
- Victim: 929
- Evidence: 989

Full Training Data (Final Release)

Expected: >4,000 Reddit submission statements
Balanced class and marker coverage
Includes data from over 190 subreddits, both conspiratorial and general-interest

Download

The full dataset is available via Zenodo:
https://doi.org/10.5281/zenodo.15114172

Starter Pack

Use our GitHub repository for a starter pack with baseline code and rehydration scripts:
https://github.com/hide-ous/semeval26_starter_pack

📬 Join the Mailing List

PsyCoMark @ SemEval 2026