Expert Q-learning: Deep Reinforcement Learning with Coarse State Values from Offline Expert Examples

Li Meng; Anis Yazidi; Morten Goodwin; Paal Engelstad

doi:10.7557/18.6237

Authors

Li Meng University of Oslo
Anis Yazidi Oslo Metropolitan University
Morten Goodwin University of Agder
Paal Engelstad University of Oslo

DOI:

https://doi.org/10.7557/18.6237

Keywords:

Reinforcement Learning, Imitation Learning, Semi-supervised Learning, Deep Learning

Abstract

In this article, we propose a novel algorithm for deep reinforcement learning named Expert Q-learning. Expert Q-learning is inspired by Dueling Q-learning and aims to incorporate semi-supervised learning into reinforcement learning through splitting Q-values into state values and action advantages. We require that an offline expert assesses the value of a state in a coarse manner using three discrete values. An expert network is designed in addition to the Q-network, which updates each time following the regular offline minibatch update whenever the expert example buffer is not empty. Using the board game Othello, we compare our algorithm with the baseline Q-learning algorithm, which is a combination of Double Q-learning and Dueling Q-learning. Our results show that Expert Q-learning is indeed useful and more resistant to the overestimation bias. The baseline Q-learning algorithm exhibits unstable and suboptimal behavior in non-deterministic settings, whereas Expert Q-learning demonstrates more robust performance with higher scores, illustrating that our algorithm is indeed suitable to integrate state values from expert examples into Q-learning.

Expert Q-learning: Deep Reinforcement Learning with Coarse State Values from Offline Expert Examples

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

Information

Make a Submission

Current Issue