Distilling Step-by-Step: Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

Distilling Step-by-Step: Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes