Hierarchical Decision Making Based on Structural Information Principles (SIDM)

Submitting to Journal of Machine Learning Research, July 2024

Xianghua Zeng Hao Peng Dingli Su Angsheng Li
School of Computer Science and Engineering, Beihang University School of Cyber Science and Technology, Beihang University School of Computer Science and Engineering, Beihang University School of Computer Science and Engineering, Beihang University

TL;DR  We propose a new, unsupervised, and adaptive Decision-Making framework called SIDM for Reinforcement Learning. This approach handles high complexity environments without manual intervention, and increases sample efficiency and policy effectiveness.

Paper | Code


Abstract  Reinforcement Learning (RL) algorithms often rely on specific hyperparameters to guarantee their performances, especially in highly complex environments. This paper proposes an unsupervised and adaptive Decision-Making framework called SIDM for RL, which uses action and state abstractions to address this issue. SIDM improves policy quality, stability, and sample efficiency by up to 32.70%, 88.26%, and 64.86%, respectively.


Framework Framework

Approach  In this work, we propose a novel Structural Information principles-based hierarchical Decision Making framework, called SIDM, to address the above challenges:

1. Adaptive Abstraction Mechanism: Compresses high-dimensional and noisy state-action information by creating state and action abstractions. This mechanism uses an encoding tree to group states or actions with similar features into the same community and employs an aggregation function to derive abstract representations of states and actions.

2. Directed Structural Entropy: Formally defined and optimized to overcome the limitations of undirected structural information and meet the requirements of directed state transitions. Using directed entropy, we quantify the transition probabilities between abstract states and identify frequently transitioning abstract communities as skills, facilitating unsupervised skill discovery.

3. Skill-Based and Role-Based Methods: Builds on abstract actions and discovered skills to develop a skill-based method for single-agent learning and a role-based method for multi-agent collaboration. These methods operate independently of manual assistance and can flexibly integrate various RL algorithms to enhance performance.

4. Extensive Experiments: Conducted extensive experiments and comprehensive analyses on diverse well-established benchmarks, including visual gridworld navigation, continuous robotic control, and StarCraft II micromanagement. Comparative results show that our framework significantly improves the effectiveness, efficiency, and stability of policy learning over state-of-the-art baselines by up to 32.70%, 64.86%, and 88.26%, respectively.


Videos

Task SIDM SAC SE HIRO HSD
Hurdles
Limbo
Pole Balance
Hurdles Limbo
Stairs