Permutation Invariant Agent-Specific Centralized Critic in Multi-Agent Reinforcement Learning

Conference proceedings article


Authors/Editors

No matching items found.


Strategic Research Themes


Publication Details

Author listNoppakun, Patsornchai; Akkarajitsakul, Khajonpong;

PublisherFrontiers

Publication year2022

Journal acronymFront. Mar. Sci.

Start page15

End page18

Number of pages4

ISBN9781665489126

eISSNElectronic ISSN 2296-7745

URLhttps://www.scopus.com/inward/record.uri?eid=2-s2.0-85151635240&doi=10.1109%2fInCIT56086.2022.10067429&partnerID=40&md5=1e7c3c586811b238684b7686fcffd168

LanguagesEnglish-Great Britain (EN-GB)


View on publisher site


Abstract

We proposed a permutation invariant agent-specific centralized critic using graph convolutional networks in multiagent reinforcement learning. We consider an environment with partial observability where a joint observation of homogeneous agents is used as a state information in centralized training. A joint observation of homogeneous agents is permutation invariant, meaning that different permutations must be treated as the same. However, a traditional deep network like multilayer perceptron (MLP) outputs different values to different permutations, despite being the same data. A centralized critic using MLPs to represent joint observation of homogeneous agents suffers from data inefficiency because it only learns a single permutation instead of all permutations. Previous work has addressed this problem using graph convolutional networks (GCN) for 'agent-agnostic'' centralized critics. Our work extends the use of GCNs to an 'agent-specific'' centralized critic such as the critic used in Counterfactual Multi-Agent Policy Gradients (COMA) algorithm. We introduce three GCN variants of agentspecific critic architectures. Our experimental results on the multi-agent particle environment with COMA algorithm show that all GCN critics outperform the MLP baseline critics. Finally, we concluded that as the number of agents increases, the critic that takes advantage of agent homogeneity by separating global and local feature representation is the most scalable in terms of time complexity. © 2022 IEEE.


Keywords

graph convolutional networkhomogeneous agentsMulti-agent systempermutation invariance


Last updated on 2023-29-09 at 07:37