Resumo
In this paper a new distributed multi-agent Actor-Critic algorithm for reinforcement learning is proposed for solving multi-agent multi-task optimization problems. The Critic algorithm is in the form of a Distributed Emphatic Temporal Difference DETD(λ) algorithm, while the Actor algorithm is proposed as a complementary consensus based policy gradient algorithm, derived from a global objective function having the role of a scalarizing function in multi-objective optimization. It is demonstrated that the Feller-Markov properties hold for the newly derived Actor algorithm. A proof of the weak convergence of the algorithm to the limit set of an attached ODE is derived under mild conditions, using a specific decomposition between the Critic and the Actor algorithms and additional two-time-scale stochastic approximation arguments. An experimental verification of the algorithm properties is given, showing that the algorithm can represent an efficient tool for practice.
Idioma original | Inglês |
---|---|
Número do artigo | 100853 |
Revista | European Journal of Control |
Volume | 74 |
DOIs | |
Estado da publicação | Publicadas - nov. 2023 |
Nota bibliográfica
Publisher Copyright:© 2023 European Control Association
Financiamento
Financiadoras/-es | Número do financiador |
---|---|
Fundação para a Ciência e a Tecnologia | UIDB/04111/2020 |
Science Fund of the Republic of Serbia | 7754287 |
Impressão digital
Mergulhe nos tópicos de investigação de “Multi-agent off-policy actor-critic algorithm for distributed multi-task reinforcement learning“. Em conjunto formam uma impressão digital única.Imprensa/meios de comunicação
-
Reports from Singidunum University Highlight Recent Findings in Mathematics (Multi-agent Off-policy Actor-critic Algorithm for Distributed Multi-task Reinforcement Learning)
5/01/24
1 item de Cobertura de meios de comunicação
Imprensa/meios de comunicação: Imprensa