Introducing synthetictext
Published:
A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.
Published:
Published:
Published:
synthetictext is an LLM-powered Python package for generating synthetic text data for text classification tasks. Read more
Published:
Published:
In this short tutorial, let’s look at the US Patent and Trademark Office (USPTO) website and scrape the patent database using a keyword search. We will use Selenium WebDriver to scrape the data. We will then use the Requests library to download the individual patent PDF documents. Read more
Published:
Identifying the user issues worth solving is very crucial but often ignored Read more
Published:
Generate tailored interview questions from a resume and job description using AI Read more
Published:
RAG pipeline for answering warranty claims questions from unstructured PDF data Read more
Published:
LLM-powered simulation of design concept sprints Read more
Published:
Fine-tuned Llama2-7b for mental health QA with human evaluation platform Read more
Published:
Selected projects from my Master’s at Indiana University Read more
Published:
LLM agent with tool-calling for automated code investigation from Jira tickets Read more
Published:
Full retrieval-augmented generation pipeline with hybrid retrieval and LLM-as-judge evaluation Read more
Published:
Personal full-stack app to track national park visits Read more
Published in IEEE International Symposium on Technology and Society (ISTAS), 2019
Video translation of English content into Indian regional languages using open innovation approaches. Read more
Published in CLEF (Working Notes), 2024
Data augmentation techniques applied to conspiracy theory detection (PAN 2024). Read more
Published in Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval), 2024
Comparison of traditional and neural approaches for detecting machine-authored text (SemEval Task 8). Read more
Published in HICSS, 2025
Large-scale collection and vulnerability assessment pipeline for machine learning open-source software. Read more
Published in arXiv preprint arXiv:2605.07201, 2026
Fine-tuning LLMs with synthetic augmentation for multi-class toxicity detection in gaming chat (arXiv preprint). Read more
Published in arXiv preprint arXiv:2605.05159, 2026
Multilingual polarization detection using ensemble Gemma models and synthetic data augmentation (SemEval 2026 submission; arXiv preprint). Read more
Published:
Oral presentation at ACL 2026 for the EEUCA 2026 Shared Task on Understanding Toxic Behavior in Gaming Communities. Read more
Published:
Poster presentation at ACL 2026 for SemEval-2026 Task 9: Multilingual Polarization Detection. Read more