Quiz: Introduction to Reinforcement Learning

Question 1 (True/False)

In reinforcement learning, agents are trained with labeled data to learn a direct mapping from inputs to outputs.

True
False

Show Answer

Correct Answer: False
Explanation:
Unlike supervised learning, reinforcement learning does not rely on labeled data.

"We will not receive supervision in the form of the correct decision... instead, we will only receive evaluative feedback in the form of reward for the decision..."

Question 2 (Multi-Select)

Which of the following accurately describe key characteristics and properties of reinforcement learning? (Select all that apply)

Sequential decision-making in an environment with evaluative feedback
Learning through trial and error
Delayed feedback on the quality of decisions
Learning optimal behavior through direct supervision
Balancing exploration of new strategies with exploitation of known good strategies
Receiving only rewards/penalties rather than correct action labels

Show Answer

Correct Answers: ✅ Sequential decision-making with evaluative feedback, ✅ Learning through trial and error, ✅ Delayed feedback, ✅ Balancing exploration and exploitation, ✅ Receiving rewards/penalties not labels
Explanation:
Reinforcement learning involves multiple key characteristics beyond just the basic definition.

"Reinforcement learning can be defined in one sentence as a sequential decision making in an environment with evaluative feedback." "A key characteristic is that the agent receives delayed feedback, making it difficult to determine which actions led to rewards." "RL algorithms must balance exploration of new strategies with exploitation of known successful strategies."

Question 3 (Multiple Select)

Which of the following are key characteristics or challenges of reinforcement learning?

The agent receives a label for the correct action at each step
The reward signal may be delayed
The policy affects future data distribution
Every state-action pair is observed multiple times

Show Answer

Correct Answers: The reward signal may be delayed, The policy affects future data distribution
Explanation:
RL faces unique challenges like delayed rewards and non-stationary data caused by policy updates.

"The reward may be delayed and it can only happen at the end of the task..."
"Any updates made to the policy... will change the data distribution... making this distribution non stationary."

Question 4 (Multi-Select)

Which of the following accurately describe what an RL agent receives from the environment during the interaction loop? (Select all that apply)

The next observation (state) after taking an action
A reward signal that evaluates the action taken
Information about which action would have been optimal
The probability distribution over all possible next states
The environment's internal state which may differ from the observation
The time remaining until the episode terminates

Show Answer

Correct Answers: ✅ The next observation/state, ✅ A reward signal
Explanation:
The agent receives both the new observation and the reward after taking an action, but not the other information.

"...the agent will receive an observation... it will execute an action... and produce a new observation... as well as a reward..." "The agent does not receive information about what the optimal action would have been, nor does it typically have access to the environment's full internal state."

Question 5 (True/False)

The data distribution that a reinforcement learning agent sees remains stationary throughout training.

True
False

Show Answer

Correct Answer: False
Explanation:
The data distribution is non-stationary because the policy changes what data the agent sees.

"...will change the data distribution of states and rewards... making this distribution non stationary."

Question 6 (Multi-Select)

Which of the following accurately describe the nature and characteristics of "evaluative feedback" in reinforcement learning? (Select all that apply)

Feedback is received in the form of a reward signal
Feedback is only provided for actions actually taken, not hypothetical actions
Feedback may be delayed, not immediate after each action
Feedback provides a scalar value rather than a complete corrective instruction
Feedback indicates the quality of an action, not the correct action itself
Feedback is deterministic and consistent across all environments

Show Answer

Correct Answers: ✅ Received as reward signal, ✅ Only provided for actions taken, ✅ May be delayed, ✅ Provides scalar value, ✅ Indicates quality not correctness
Explanation:
Evaluative feedback has multiple key characteristics that distinguish it from instructional feedback.

"Evaluative feedback means that the agent... receive rewards... only for the actions that it did take and not for the actions that did not take." "The reward signal merely evaluates actions, rather than instructing which action is correct." "The reward signal might be delayed, making it challenging to determine which action in a sequence led to a positive outcome."

Question 7 (Multiple Select)

Which are common application domains for reinforcement learning as discussed in the lecture?

Image segmentation
Robotic control
Atari game playing
Board games like Go

Show Answer

Correct Answers: Robotic control, Atari game playing, Board games like Go
Explanation:
These specific applications are mentioned in the lecture.

"Here is an instance of a robotic control task..."
"RL applied to learn how to play Atari video games..."
"RL has also been applied to games like go..."

Question 8 (True/False)

The RL agent receives rewards for all actions, including those it did not take.

True
False

Show Answer

Correct Answer: False
Explanation:
Only actions that are actually taken receive rewards.

"...the agent is supposed to pick actions and receive rewards... only for the actions that it did take and not for the actions that did not take."

Question 9 (Multi-Select)

Which of the following accurately describe challenges specific to reinforcement learning in real-world robotic settings? (Select all that apply)

Some states may be seen only once
Exploration can be dangerous or costly
Real-world dynamics are often complex and difficult to model
Hardware constraints limit computation speed
Actions have physical consequences that cannot be reversed
Reward signals may be sparse and difficult to design

Show Answer

Correct Answers: ✅ States seen only once, ✅ Exploration can be dangerous/costly, ✅ Complex dynamics, ✅ Hardware constraints, ✅ Physical consequences, ✅ Sparse rewards
Explanation:
Real-world robotics faces numerous challenges that make RL particularly difficult.

"...the agent will see certain states only once and never again in his lifetime, making it difficult to learn from past mistakes." "In real-world settings, exploration can lead to physical damage, and the dynamics are often complex and difficult to model accurately." "The hardware constraints and physical consequences of actions create additional challenges not present in simulated environments."

Question 10 (Multiple Choice)

What is the agent’s goal in reinforcement learning?

Minimize classification error
Memorize sequences of states
Maximize long-term accumulated reward
Discover

Show Answer

Correct Answer: Maximize long-term accumulated reward
Explanation:
The core objective of the agent is long-term reward maximization.

"The objective of this agent is to maximize the reward it will get from the environment in the long run."

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quiz: Introduction to Reinforcement Learning

Question 1 (True/False)

Question 2 (Multi-Select)

Question 3 (Multiple Select)

Question 4 (Multi-Select)

Question 5 (True/False)

Question 6 (Multi-Select)

Question 7 (Multiple Select)

Question 8 (True/False)

Question 9 (Multi-Select)

Question 10 (Multiple Choice)

FilesExpand file tree

17.1Combined.md

Latest commit

History

17.1Combined.md

File metadata and controls

Quiz: Introduction to Reinforcement Learning

Question 1 (True/False)

Question 2 (Multi-Select)

Question 3 (Multiple Select)

Question 4 (Multi-Select)

Question 5 (True/False)

Question 6 (Multi-Select)

Question 7 (Multiple Select)

Question 8 (True/False)

Question 9 (Multi-Select)

Question 10 (Multiple Choice)