Permutation Invariant Agent-Specific Centralized Critic in Multi-Agent Reinforcement Learning

Conference proceedings article

ผู้เขียน/บรรณาธิการ

ไม่พบข้อมูลที่เกี่ยวข้อง

กลุ่มสาขาการวิจัยเชิงกลยุทธ์

เรียนรู้ของเครื่อง (การวิเคราะห์ข้อมูลขนาดใหญ่)

รายละเอียดสำหรับงานพิมพ์

รายชื่อผู้แต่ง: Noppakun, Patsornchai; Akkarajitsakul, Khajonpong;

ผู้เผยแพร่: Frontiers

ปีที่เผยแพร่ (ค.ศ.): 2022

ชื่อย่อของวารสาร: Front. Mar. Sci.

หน้าแรก: 15

หน้าสุดท้าย: 18

จำนวนหน้า: 4

ISBN: 9781665489126

eISSN: Electronic ISSN 2296-7745

URL: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85151635240&doi=10.1109%2fInCIT56086.2022.10067429&partnerID=40&md5=1e7c3c586811b238684b7686fcffd168

ภาษา: English-Great Britain (EN-GB)

ดูบนเว็บไซต์ของสำนักพิมพ์

บทคัดย่อ

We proposed a permutation invariant agent-specific centralized critic using graph convolutional networks in multiagent reinforcement learning. We consider an environment with partial observability where a joint observation of homogeneous agents is used as a state information in centralized training. A joint observation of homogeneous agents is permutation invariant, meaning that different permutations must be treated as the same. However, a traditional deep network like multilayer perceptron (MLP) outputs different values to different permutations, despite being the same data. A centralized critic using MLPs to represent joint observation of homogeneous agents suffers from data inefficiency because it only learns a single permutation instead of all permutations. Previous work has addressed this problem using graph convolutional networks (GCN) for 'agent-agnostic'' centralized critics. Our work extends the use of GCNs to an 'agent-specific'' centralized critic such as the critic used in Counterfactual Multi-Agent Policy Gradients (COMA) algorithm. We introduce three GCN variants of agentspecific critic architectures. Our experimental results on the multi-agent particle environment with COMA algorithm show that all GCN critics outperform the MLP baseline critics. Finally, we concluded that as the number of agents increases, the critic that takes advantage of agent homogeneity by separating global and local feature representation is the most scalable in terms of time complexity. © 2022 IEEE.

คำสำคัญ

graph convolutional network, homogeneous agents, Multi-agent system, permutation invariance