Decentralized Multi-Agent Multi-Task Q-Learning with Function Approximation for POMDPs

Milos S. Stankovic, Marko Beko, Srdjan S. Stankovic

Resultado de pesquisarevisão de pares

Resumo

In this paper we propose a novel distributed gradient-based two-time-scale algorithm for decentralized multi-agent multi-task learning (MTL) using a linear approximation of the optimal action value function (Q -function) in POMDPs. The algorithm is based on the idea of using in a concurrent way recursive Bayesian state belief filters for estimation of the system model parameters, prediction of the hidden state and definition of the optimal approximation parameters of the local Q-functions. The main MTL algorithm is composed of: 1) local parameter updates based on an off-policy gradient-based learning algorithm with target policy belonging to the greedy or Gibbs classes, and 2) a linear stochastic time-varying consensus scheme for parameters shared between the agents in order to achieve the MTL goal. It is proved, under general assumptions, that the parameter estimates generated by the proposed algorithm weakly converge to a bounded invariant set of the corresponding ordinary differential equations (ODE). Simulation results illustrate the effectiveness of the algorithm.

Idioma originalInglês
Título da publicação do anfitrião2024 IEEE 63rd Conference on Decision and Control, CDC 2024
EditoraInstitute of Electrical and Electronics Engineers Inc.
Páginas7680-7685
Número de páginas6
ISBN (eletrónico)9798350316339
DOIs
Estado da publicaçãoPublicadas - 2024
Evento63rd IEEE Conference on Decision and Control, CDC 2024 - Milan
Duração: 16 dez. 202419 dez. 2024

Série de publicação

NomeProceedings of the IEEE Conference on Decision and Control
ISSN (impresso)0743-1546
ISSN (eletrónico)2576-2370

Conferência

Conferência63rd IEEE Conference on Decision and Control, CDC 2024
País/TerritórioItaly
CidadeMilan
Período16/12/2419/12/24

Nota bibliográfica

Publisher Copyright:
© 2024 IEEE.

Financiamento

Financiadoras/-esNúmero do financiador
Science Fund of the Republic of Serbia7502
Fundação para a Ciência e Tecnologia2022.07530, UIDB/04111/2020

Impressão digital

Mergulhe nos tópicos de investigação de “Decentralized Multi-Agent Multi-Task Q-Learning with Function Approximation for POMDPs“. Em conjunto formam uma impressão digital única.

Citar isto