This weekend I published synthetictext, an LLM-powered Python package for generating synthetic text data for text classification tasks.
The goal is simple: make it easier to create usable synthetic training data for classification workflows without having to wire together a one-off pipeline every time.
I used the same basic synthetic-data workflow in recent classification projects, including multilingual and low-resource settings, and wanted a reusable package that I could point at a new task spec instead of rebuilding the pipeline from scratch.
If you try it, I’d love to hear what works well and what needs to be added next.