Dataset Description
The PsyCoMark dataset is built from Reddit submission statements — user-written summaries accompanying media submissions. These are rich in narrative content and ideal for identifying psycholinguistic markers.
Annotation
Comments are annotated with binary conspiracy labels (conspiracy, not conspiracy, can't tell) and span-level markers based on psychological mechanisms of conspiracy belief. Annotation was performed by native English speakers via Prolific and underwent rigorous quality control.
Statistics
Trial Data (Development Phase)
- ~1,000 Reddit submission statements
- Classes: 630 conspiracy, 655 not conspiracy, 361 can’t tell
- Marker annotations:
- Actor: 1276
- Action: 1239
- Effect: 1009
- Victim: 929
- Evidence: 989
Full Training Data (Final Release)
- Expected: >4,000 Reddit submission statements
- Balanced class and marker coverage
- Includes data from over 190 subreddits, both conspiratorial and general-interest
Download
The full dataset is available via Zenodo:
https://doi.org/10.5281/zenodo.15114172
Starter Pack
Use our GitHub repository for a starter pack with baseline code and rehydration scripts:
https://github.com/hide-ous/semeval26_starter_pack