Multilingual Polarization Detection Using Ensemble Gemma Models with Synthetic Data Augmentation

Date:

Poster presentation at ACL 2026 for SemEval-2026 Task 9: Multilingual Polarization Detection.

Our system fine-tunes per-language Gemma 3 models (12B and 27B) using LoRA, augmented with synthetic data generated via three strategies (direct generation, paraphrasing, and contrastive pair creation) using GPT-4o-mini. We employ a multi-stage quality filtering pipeline including embedding-based deduplication and per-language threshold tuning.

Result: 2nd place overall across 22 languages (mean macro-F1: 0.811), with 1st place in 3 languages and top-3 in 8 languages.

Paper on arXiv