Categorize content themes in large datasets