Multi-agent off-policy actor-critic algorithm for distributed multi-task reinforcement learning

Miloš S. Stanković, Marko Beko, Nemanja Ilić, Srdjan S. Stanković

Resultado de pesquisarevisão de pares

4 Citações (Scopus)

Resumo

In this paper a new distributed multi-agent Actor-Critic algorithm for reinforcement learning is proposed for solving multi-agent multi-task optimization problems. The Critic algorithm is in the form of a Distributed Emphatic Temporal Difference DETD(λ) algorithm, while the Actor algorithm is proposed as a complementary consensus based policy gradient algorithm, derived from a global objective function having the role of a scalarizing function in multi-objective optimization. It is demonstrated that the Feller-Markov properties hold for the newly derived Actor algorithm. A proof of the weak convergence of the algorithm to the limit set of an attached ODE is derived under mild conditions, using a specific decomposition between the Critic and the Actor algorithms and additional two-time-scale stochastic approximation arguments. An experimental verification of the algorithm properties is given, showing that the algorithm can represent an efficient tool for practice.

Idioma originalInglês
Número do artigo100853
RevistaEuropean Journal of Control
Volume74
DOIs
Estado da publicaçãoPublicadas - nov. 2023

Nota bibliográfica

Publisher Copyright:
© 2023 European Control Association

Financiamento

Financiadoras/-esNúmero do financiador
Fundação para a Ciência e a TecnologiaUIDB/04111/2020
Science Fund of the Republic of Serbia7754287

    Impressão digital

    Mergulhe nos tópicos de investigação de “Multi-agent off-policy actor-critic algorithm for distributed multi-task reinforcement learning“. Em conjunto formam uma impressão digital única.

    Citar isto