How we won the PAN 2024 Text Classification Challenge

1 minute read

Published:

Background

In 2024, I participated in the CLEF-PAN 2024 shared task on opposition stance detection in social media posts as part of the IUCL team at Indiana University. The task required classifying whether a social media post expresses opposition or support toward a given target.

The Competition

PAN 2024 featured teams from universities and research labs worldwide. The challenge involved detecting nuanced stance expressions across diverse social media text, where sarcasm, implicit stance, and cultural context made simple keyword approaches insufficient.

Research Process

Our approach centered on fine-tuning pretrained language models for sequence classification. Key decisions:

  • Model selection: We experimented with multiple transformer architectures and found that task-specific fine-tuning significantly outperformed zero-shot and few-shot approaches
  • Data augmentation: We used augmentation strategies to expand the training set, particularly for underrepresented stance categories
  • Ensemble methods: Our final submission combined predictions from multiple fine-tuned models

Results

We achieved an F1 score of 0.83, winning the competition among all participating teams. Our findings were published at the CLEF-PAN 2024 conference.

The key insight was that careful data augmentation combined with per-class threshold optimization provided consistent gains over vanilla fine-tuning, a pattern we later exploited more extensively in our SemEval 2026 work.

Outro

This competition was my first shared task win and directly shaped my research direction toward multilingual classification and synthetic data augmentation — themes I’ve continued exploring in SemEval 2026 and EEUCA 2026.

Read the paper