Few-shot learning assumes access to many labeled examples per class at inference time.
- True
- False
Show Answer
Correct Answer: False
Explanation:
Few-shot learning operates under the constraint of very limited labeled examples per class.
"We only have a few examples called a support set, typically we only have somewhere on the range of 1 to 5 examples per category..."
Which of the following accurately describe the goals and approaches of meta-training in few-shot learning? (Select all that apply)
- To simulate test conditions using tasks constructed from base classes
- To learn how to learn quickly from small amounts of data
- To develop models that generalize well to unseen classes
- To pre-train a classifier across all possible test classes
- To build experience with many related tasks that transfer to new tasks
- To align the training process with the inference process
Show Answer
Correct Answers: ✅ Simulate test conditions, ✅ Learn to learn quickly, ✅ Develop models for unseen classes, ✅ Build transferable experience, ✅ Align training with inference
Explanation:
Meta-training serves multiple important purposes in few-shot learning beyond just task simulation.
"We'd like to align what we do during training with what we do during testing... This is called meta-training..." "The goal is to train the model on a distribution of tasks, so it learns how to learn quickly from small amounts of data." "This way, the model builds experience across many tasks that can transfer to new, unseen classes at test time."
Which of the following are strategies used to prevent overfitting during fine-tuning in few-shot learning?
- Freezing the feature extractor
- Using cosine classifiers instead of fully connected layers
- Training on thousands of examples per class
- Applying early stopping and hyperparameter tuning
Show Answer
Correct Answers: Freezing the feature extractor, Using cosine classifiers instead of fully connected layers
Explanation:
These techniques help prevent overfitting when training with very few examples.
"...fix all the weights and just fine tune the last layer..."
"...cosine classifier... are so constrained that it's harder to overfit with them..."
Which of the following accurately describe properties and advantages of cosine similarity-based classification in few-shot learning? (Select all that apply)
- It primarily compares the angular separation (directional similarity) between vectors
- It normalizes feature vectors, making the comparison invariant to magnitude
- It helps address the scarcity of examples by focusing on direction rather than scale
- It requires computing the absolute differences between vector components
- It works well with embeddings from different domains or distributions
- It provides a natural way to measure semantic similarity in the feature space
Show Answer
Correct Answers: ✅ Compares angular separation, ✅ Normalizes vectors for magnitude invariance, ✅ Addresses scarcity by focusing on direction, ✅ Works well across domains, ✅ Measures semantic similarity
Explanation:
Cosine similarity has several important properties that make it well-suited for few-shot learning.
"You're only looking at the angles between the feature vectors rather than incorporating how long they are..." "This normalization is particularly helpful when examples are scarce and feature magnitudes might vary." "Cosine similarity provides a natural way to compare the semantic direction of embeddings across potentially different distributions."
Which of the following are core components of a prototypical network?
- Feature extractor shared across support and query sets
- One-hot encoding of query items
- Mean embeddings of each class in the support set
- Learned reward functions for each query
Show Answer
Correct Answers: Feature extractor shared across support and query sets, Mean embeddings of each class in the support set
Explanation:
Prototypical networks use class prototypes computed as the mean of embeddings.
"...you take the mean of those and then you compare each query item to that mean..."
Which of the following accurately describe the key aspects and advantages of prototypical networks in few-shot learning? (Select all that apply)
- They create class prototypes by averaging support set examples for each class
- They compare query items to all prototypes using a distance function
- They require no fine-tuning or adaptation during inference
- They can handle variable numbers of examples per class
- They rely on detailed class-level annotations beyond class labels
- They perform well even with extremely limited support examples
Show Answer
Correct Answers: ✅ Create prototypes by averaging examples, ✅ Compare queries to all prototypes, ✅ Require no fine-tuning during inference, ✅ Handle variable example counts, ✅ Perform well with limited examples
Explanation:
Prototypical networks have several advantageous properties for few-shot learning scenarios.
"...you take the mean of those and then you compare each query item to that mean..." "This approach is elegant because it can handle any number of examples per class and doesn't require explicit parameter updates during inference." "The averaging operation provides a simple yet effective way to combine the limited information from support examples."
Prototypical networks compare each query item to every example in the support set individually.
- True
- False
Show Answer
Correct Answer: False
Explanation:
Query items are compared to class prototypes (means), not individual examples.
"...we're going to compute the distance... to each prototype."
Which challenges or limitations are associated with fine-tuning-based few-shot learning?
- It lacks task-specific awareness during meta-training
- It requires large support sets to work effectively
- It is sensitive to hyperparameter choices
- It cannot generalize to new classes
Show Answer
Correct Answers: It lacks task-specific awareness during meta-training, It is sensitive to hyperparameter choices
Explanation:
Fine-tuning approaches face challenges related to meta-awareness and sensitivity.
"...not meta-aware, as we're not explicitly simulating few-shot tasks during pre-training..."
"...the hyperparameters for fine-tuning matter a lot..."
What aspect of meta-learning does MAML specifically aim to optimize? (Select all that apply)
- The initialization of parameters for rapid adaptation
- The learning algorithm itself
- The architecture of the neural network
- The meta-optimization strategy for efficient adaptation
- The representation space of the model
- The capability to quickly adapt to new tasks with minimal steps
Show Answer
Correct Answers: ✅ Initialization for rapid adaptation, ✅ Meta-optimization strategy, ✅ Capability for quick adaptation
Explanation:
MAML focuses on finding optimal initialization parameters that allow quick adaptation.
"The goal of MAML is to learn an initialization that can be fine-tuned quickly to new tasks with just a few gradient steps." "It's meta-learning because we're optimizing specifically for adaptability rather than just task performance." "MAML optimizes for the ability to learn new tasks quickly, not just performance on the training tasks."
Which of the following accurately describe the relationship between gradient descent and meta-learning? (Select all that apply)
- Gradient descent can be viewed as a differentiable computation graph suitable for meta-learning
- Meta-learning can optimize the initial parameters that gradient descent starts from
- The meta-objective includes performance after adaptation, not just initial performance
- Gradient descent must be replaced with more advanced algorithms in meta-learning
- The learning algorithm itself can be parameterized and learned
- Meta-learning can optimize for quick adaptation rather than final performance
Show Answer
Correct Answers: ✅ Gradient descent is a differentiable computation graph, ✅ Meta-learning optimizes initial parameters, ✅ Meta-objective includes post-adaptation performance, ✅ Learning algorithm can be parameterized, ✅ Can optimize for quick adaptation
Explanation:
The relationship between gradient descent and meta-learning has several important dimensions.
"Gradient descent itself is a differentiable computation, which means we can differentiate through the adaptation process." "Meta-learning optimizes for performance after adaptation, which is a fundamentally different objective than standard learning." "We can either learn good initialization points for gradient descent, or learn the update rule itself."
What does the Meta-LSTM approach learn that MAML does not?
- How to initialize parameters before adaptation
- A task-specific regularizer
- A learnable update rule conditioned on gradients
- The number of steps required for convergence
Show Answer
Correct Answer: A learnable update rule conditioned on gradients
Explanation:
Meta-LSTM learns an explicit update rule rather than just initialization.
"...it can learn to do more complex update rules that are not purely following the gradient..."
What are advantages of learning initialization via MAML compared to learning an explicit update rule?
- Simpler optimization
- Better theoretical convergence guarantees
- Higher flexibility in modeling long-range dependencies
- Less overfitting when applied to small support sets
Show Answer
Correct Answers: Simpler optimization, Less overfitting when applied to small support sets
Explanation:
Learning initialization has practical advantages in terms of simplicity and generalization.
"...simpler conceptually and more robust in practice..."
"...less prone to overfitting on small tasks..."