Permutation Invariant Agent-Specific Centralized Critic in Multi-Agent Reinforcement Learning

Conference proceedings article

Authors/Editors

No matching items found.

Strategic Research Themes

Machine Learning (Big Data Analytics)

Publication Details

Author list: Noppakun, Patsornchai; Akkarajitsakul, Khajonpong;

Publisher: Frontiers

Publication year: 2022

Journal acronym: Front. Mar. Sci.

Start page: 15

End page: 18

Number of pages: 4

ISBN: 9781665489126

eISSN: Electronic ISSN 2296-7745

URL: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85151635240&doi=10.1109%2fInCIT56086.2022.10067429&partnerID=40&md5=1e7c3c586811b238684b7686fcffd168

Languages: English-Great Britain (EN-GB)

View on publisher site

Abstract

We proposed a permutation invariant agent-specific centralized critic using graph convolutional networks in multiagent reinforcement learning. We consider an environment with partial observability where a joint observation of homogeneous agents is used as a state information in centralized training. A joint observation of homogeneous agents is permutation invariant, meaning that different permutations must be treated as the same. However, a traditional deep network like multilayer perceptron (MLP) outputs different values to different permutations, despite being the same data. A centralized critic using MLPs to represent joint observation of homogeneous agents suffers from data inefficiency because it only learns a single permutation instead of all permutations. Previous work has addressed this problem using graph convolutional networks (GCN) for 'agent-agnostic'' centralized critics. Our work extends the use of GCNs to an 'agent-specific'' centralized critic such as the critic used in Counterfactual Multi-Agent Policy Gradients (COMA) algorithm. We introduce three GCN variants of agentspecific critic architectures. Our experimental results on the multi-agent particle environment with COMA algorithm show that all GCN critics outperform the MLP baseline critics. Finally, we concluded that as the number of agents increases, the critic that takes advantage of agent homogeneity by separating global and local feature representation is the most scalable in terms of time complexity. © 2022 IEEE.

Keywords

graph convolutional network, homogeneous agents, Multi-agent system, permutation invariance