A new approach to intelligent decision-making in reinforcement learning

A new paper published 6 July in Intelligent Computing presents the primary challenges of reinforcement learning for intelligent decision-making in complex and dynamic environments.

Reinforcement learning is a type of machine learning in which an agent learns to make decisions by interacting with an environment and receives rewards or penalties. The agent’s goal is to maximize long-term rewards by determining the best actions to take in different situations. However, researchers Chenyang Wu and Zongzhang Zhang of Nanjing University are convinced that reinforcement learning methods that rely solely on rewards and penalties will not succeed at producing intelligent abilities such as learning, perception, social interaction, language, generalization and imitation.

In their paper, Wu and Zhang identified what they see as the shortcomings of current reinforcement learning methods. A major issue is the amount of information that needs to be collected through trial and error. Unlike humans who can use their past experiences to reason and make better choices, current reinforcement learning methods heavily rely on agents that try things out repeatedly on a large scale to learn how to perform tasks. When dealing with problems that involve many different factors influencing the outcome, it is necessary for agents to try out a huge number of examples to figure out the best approach. If the problem increases slightly in complexity, the number of examples needed grows fast, making it impractical for the agent to operate efficiently. To make matters worse, even if the agent had all the information needed to determine the best strategy, figuring it out would still be very hard and time-consuming. This makes the learning process slow and inefficient.

Both statistical inefficiencies and computational inefficiencies hinder the practicality of achieving general reinforcement learning from scratch. Current methods lack the efficiency required to unlock the full potential of reinforcement learning in developing diverse abilities without extensive computational resources.

Wu and Zhang argue that statistical and computational challenges can be overcome by accessing high-value information in observations. Such information can enable strategy improvements through observation alone, without the need for direct interaction. Imagine how long it would take for an agent to learn to play Go by playing Go---in other words, through trial and error. Then imagine how much faster an agent could learn by reading Go manuals---in other words, by using high-value information. Clearly, the ability to learn from information-rich observations is crucial for efficiently solving complex real-world tasks.

High-value information possesses two distinct characteristics that set it apart. First, it is not independent and identically distributed, implying that it involves complex interactions and dependencies, distinguishing it from past observations. To fully comprehend high-value information, one must consider its relationship with past information and acknowledge its historical context.

The second feature of high-value information is its relevance to computationally aware agents. Agents with unlimited computational resources may overlook high-level strategies and rely solely on basic-level rules to derive optimal approaches. These agents disregard higher-level abstractions, which may introduce inaccuracies, and prioritize computational efficiency over accuracy. Only agents that are aware of computational trade-offs and capable of appreciating the value of computationally beneficial information can effectively leverage the benefits of high-value information.

In order for reinforcement learning to make efficient use of high-value information, agents must be designed in new ways. In accordance with their formalization of intelligent decision-making as “bounded optimal lifelong reinforcement learning,” Wu and Zhang identified three fundamental problems in agent design:

Intelligent Computing

10.34133/icomputing.0041

Surfing Information: The Challenge of Intelligent Decision-Making

6-Jul-2023

A new approach to intelligent decision-making in reinforcement learning

Fluke 87V Industrial Digital Multimeter

Keywords

Article Information

Contact Information

Source

How to Cite This Article