Skip to content

Latest commit

 

History

History
222 lines (152 loc) · 8.17 KB

File metadata and controls

222 lines (152 loc) · 8.17 KB

Quiz: Introduction to Reinforcement Learning

Question 1 (True/False)

In reinforcement learning, agents are trained with labeled data to learn a direct mapping from inputs to outputs.

  • True
  • False
Show Answer

Correct Answer: False
Explanation:
Unlike supervised learning, reinforcement learning does not rely on labeled data.

"We will not receive supervision in the form of the correct decision... instead, we will only receive evaluative feedback in the form of reward for the decision..."


Question 2 (Multi-Select)

Which of the following accurately describe key characteristics and properties of reinforcement learning? (Select all that apply)

  • Sequential decision-making in an environment with evaluative feedback
  • Learning through trial and error
  • Delayed feedback on the quality of decisions
  • Learning optimal behavior through direct supervision
  • Balancing exploration of new strategies with exploitation of known good strategies
  • Receiving only rewards/penalties rather than correct action labels
Show Answer

Correct Answers: ✅ Sequential decision-making with evaluative feedback, ✅ Learning through trial and error, ✅ Delayed feedback, ✅ Balancing exploration and exploitation, ✅ Receiving rewards/penalties not labels
Explanation:
Reinforcement learning involves multiple key characteristics beyond just the basic definition.

"Reinforcement learning can be defined in one sentence as a sequential decision making in an environment with evaluative feedback." "A key characteristic is that the agent receives delayed feedback, making it difficult to determine which actions led to rewards." "RL algorithms must balance exploration of new strategies with exploitation of known successful strategies."


Question 3 (Multiple Select)

Which of the following are key characteristics or challenges of reinforcement learning?

  • The agent receives a label for the correct action at each step
  • The reward signal may be delayed
  • The policy affects future data distribution
  • Every state-action pair is observed multiple times
Show Answer

Correct Answers: The reward signal may be delayed, The policy affects future data distribution
Explanation:
RL faces unique challenges like delayed rewards and non-stationary data caused by policy updates.

"The reward may be delayed and it can only happen at the end of the task..."
"Any updates made to the policy... will change the data distribution... making this distribution non stationary."


Question 4 (Multi-Select)

Which of the following accurately describe what an RL agent receives from the environment during the interaction loop? (Select all that apply)

  • The next observation (state) after taking an action
  • A reward signal that evaluates the action taken
  • Information about which action would have been optimal
  • The probability distribution over all possible next states
  • The environment's internal state which may differ from the observation
  • The time remaining until the episode terminates
Show Answer

Correct Answers: ✅ The next observation/state, ✅ A reward signal
Explanation:
The agent receives both the new observation and the reward after taking an action, but not the other information.

"...the agent will receive an observation... it will execute an action... and produce a new observation... as well as a reward..." "The agent does not receive information about what the optimal action would have been, nor does it typically have access to the environment's full internal state."


Question 5 (True/False)

The data distribution that a reinforcement learning agent sees remains stationary throughout training.

  • True
  • False
Show Answer

Correct Answer: False
Explanation:
The data distribution is non-stationary because the policy changes what data the agent sees.

"...will change the data distribution of states and rewards... making this distribution non stationary."


Question 6 (Multi-Select)

Which of the following accurately describe the nature and characteristics of "evaluative feedback" in reinforcement learning? (Select all that apply)

  • Feedback is received in the form of a reward signal
  • Feedback is only provided for actions actually taken, not hypothetical actions
  • Feedback may be delayed, not immediate after each action
  • Feedback provides a scalar value rather than a complete corrective instruction
  • Feedback indicates the quality of an action, not the correct action itself
  • Feedback is deterministic and consistent across all environments
Show Answer

Correct Answers: ✅ Received as reward signal, ✅ Only provided for actions taken, ✅ May be delayed, ✅ Provides scalar value, ✅ Indicates quality not correctness
Explanation:
Evaluative feedback has multiple key characteristics that distinguish it from instructional feedback.

"Evaluative feedback means that the agent... receive rewards... only for the actions that it did take and not for the actions that did not take." "The reward signal merely evaluates actions, rather than instructing which action is correct." "The reward signal might be delayed, making it challenging to determine which action in a sequence led to a positive outcome."


Question 7 (Multiple Select)

Which are common application domains for reinforcement learning as discussed in the lecture?

  • Image segmentation
  • Robotic control
  • Atari game playing
  • Board games like Go
Show Answer

Correct Answers: Robotic control, Atari game playing, Board games like Go
Explanation:
These specific applications are mentioned in the lecture.

"Here is an instance of a robotic control task..."
"RL applied to learn how to play Atari video games..."
"RL has also been applied to games like go..."


Question 8 (True/False)

The RL agent receives rewards for all actions, including those it did not take.

  • True
  • False
Show Answer

Correct Answer: False
Explanation:
Only actions that are actually taken receive rewards.

"...the agent is supposed to pick actions and receive rewards... only for the actions that it did take and not for the actions that did not take."


Question 9 (Multi-Select)

Which of the following accurately describe challenges specific to reinforcement learning in real-world robotic settings? (Select all that apply)

  • Some states may be seen only once
  • Exploration can be dangerous or costly
  • Real-world dynamics are often complex and difficult to model
  • Hardware constraints limit computation speed
  • Actions have physical consequences that cannot be reversed
  • Reward signals may be sparse and difficult to design
Show Answer

Correct Answers: ✅ States seen only once, ✅ Exploration can be dangerous/costly, ✅ Complex dynamics, ✅ Hardware constraints, ✅ Physical consequences, ✅ Sparse rewards
Explanation:
Real-world robotics faces numerous challenges that make RL particularly difficult.

"...the agent will see certain states only once and never again in his lifetime, making it difficult to learn from past mistakes." "In real-world settings, exploration can lead to physical damage, and the dynamics are often complex and difficult to model accurately." "The hardware constraints and physical consequences of actions create additional challenges not present in simulated environments."


Question 10 (Multiple Choice)

What is the agent’s goal in reinforcement learning?

  • Minimize classification error
  • Memorize sequences of states
  • Maximize long-term accumulated reward
  • Discover
Show Answer

Correct Answer: Maximize long-term accumulated reward
Explanation:
The core objective of the agent is long-term reward maximization.

"The objective of this agent is to maximize the reward it will get from the environment in the long run."