Ran a systematic comparison on a sentiment classification dataset: zero-shot hit 78% accuracy, one-shot reached 85%, and three-shot examples pushed it to 91%. The examples do not need to be perfect; they just need to demonstrate the expected format and decision boundaries. Even noisy examples outperformed zero-shot. This is now my default approach for any classification task.
Leave a Reply