Knowing more than you learned is every student’s dream before a test. That sounds impossible, right?
“But it turns out, it is indeed possible,” Ilia Sucholutsky, Vice President Research at StratumAI, and a PhD student at UW, asserted. In the context of machine learning, “…it is possible to learn from less than one example per class,” Sucholutsky said.
Sucholutsky’s recent research paper is titled ‘Less Than One’-Shot Learning. The paper examines an extreme form of few-shot learning.
“There’s another regime that scientists haven’t previously noticed. They proposed the idea of few-shot learning, one-shot learning, and zero-shot learning. We’re exploring something that’s hidden between one-shot learning and zero-shot learning. We’re saying that you can design examples for machine learning models that are so efficient that you need less of the examples than the total classes,” Sucholutsky said.
How does ‘Less Than One’-shot learning work?
“Soft labels are typically used as outputs of a classification model. But now we’re saying what if the input was also to look like that,” Sucholutsky said. “We started examining whether we can create these soft label points that can train machine learning models even though there are fewer examples than classes.”
“Previously, we used hard labels to refer to a certain image or an object,” Sucholutsky said. “For example, we would [hard] label the digit ‘3’ as three.”
“But what we can also agree on is that the digit ‘3’ [visually] has more similarities with the digit ‘8’ as compared to the digit ‘7’.”
“The underlying idea of less than one-shot learning is that we can try and quantify this connection so that the machine learning algorithm can learn better from it,” Sucholutsky said. “For example, for the image of a handwritten digit ‘3’, I tell the neural network that the digit in the image is 70% the digit ‘3’ and 15% the digit ‘8’. This is a soft-label distribution over the different classes. At that point, the neural network or any machine learning algorithm can start learning the features within the image that are shared between various classes.”
“Previous authors have shown that for the MNIST handwritten digit dataset, they could train models to achieve over a 90% performance using just one synthetic example per class. They classified all 10 digits using one image per digit. What we found was that with our soft label dataset distillation, we could actually go below one image per digit. We found that we could design 5 synthetic images, give them these special soft-labels, and just with these five images, we can achieve above 90% performance for the handwritten digit classification,” Sucholutsky said.
When asked about the challenges, Sucholutsky said, “[We proved] theoretically that it is possible to have these optimal synthetic examples that allow machine learning models to learn this low-shot way that we described. But the question is how to actually design these synthetic models?” Ilia is currently exploring this very question in his follow-up paper.
Sucholutsky is very positive about the real-world possibilities of ‘Less than One’-shot learning. “In general, these [applications] include areas where you’re doing some kind of classification tasks, where there’s a large number of classes, and where there are few examples per class available. Some of these applications can include character recognition, object detection, language modelling, text classification (sentiment analysis), image captioning, and even recommendation engines.”
“People talk a lot about big data. But most people don’t have big data. It’s really hard and expensive to gather big data. We need to get our machine learning models to work well with small data, as it does with big data,” he said.
“Another interesting one for me, which is less seen in the real world, but will have a big impact is the neural architecture search. Currently, neural architecture is extremely computer-intensive. But with few-shot learning, one-shot learning, or less than one-shot learning, the cost is massively reduced.”