Controlling Blood Glucose For Patients With Type 1 DiabetesUsing Deep Reinforcement Learning – The Influence OfChanging The Reward Function


  • Miguel Angel Tejedor Hernandez UiT The Arctic University of Norway
  • Jonas Nordhaug Myhre UiT The Arctic University of Norway



deep reinforcement learning, reward function, artificial pancreas, blood glucose control


Reinforcement learning (RL) is a promising direction in adaptive and personalized type 1 diabetes (T1D) treatment. However, the reward function – a most critical component in RL – is a component that is in most cases hand designed and often overlooked. In this paper we show that different reward functions can dramatically influence the final result when using RL to treat in-silico T1D patients.


W. Clarke and B. Kovatchev. Statistical tools to analyze continuous glucose monitor data. Diabetes technology & therapeutics, 11(S1):S–45, 2009.

M. De Paula, L. O. Avila, and E. C. Martinez. Controlling blood glucose variability under uncertainty using reinforcement learning and gaussian processes. Applied Soft Computing, 35:310–332, 2015.

Y. Duan, X. Chen, R. Houthooft, J. Schulman, and P. Abbeel. Benchmarking deep reinforcement learning for continuous control. In Proceedings of the 33rd International Conference on Machine Learning, volume 48, pages 1329–1338, 2016.

R. Hovorka, V. Canonico, L. J. Chassin, U. Haueter, M. Massi-Benedetti, M. O. Federici, T. R. Pieber, H. C. Schaller, L. Schaupp, T. Vering, et al. Nonlinear model predictive control of glucose concentration in subjects with type 1 diabetes. Physiological measurement, 25(4):905, 2004.

S. M. Kakade. A natural policy gradient. In Advances in neural information processing systems, pages 1531– 1538, 2002.

S. S. Mousavi, M. Schukat, and E. Howley. Deep reinforcement learning: an overview. In Proceedings of SAI Intelligent Systems Conference, pages 426–440. Springer, 2016.

J. N. Myhre, I. K. Launonen, S. Wei, and F. Godtliebsen. Controlling blood glucose levels in patients with type 1 diabetes using fitted q-iterations and functional features. In 2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP), pages 1–6. IEEE, 2018.

P. D. Ngo, S. Wei, A. Holubová, J. Muzik, and F. Godtliebsen. Reinforcement-learning optimal control for type-1 diabetes. In 2018 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI), pages 333–336. IEEE, 2018.

J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz. Trust region policy optimization. In International Conference on Machine Learning, pages 1889– 1897, 2015.

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al. Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484, 2016.

Q. Sun, M. V. Jankovic, and S. G. Mougiakakou. Reinforcement learning-based adaptive insulin advisor for individuals with type 1 diabetes patients under multiple daily injections therapy. arXiv preprint arXiv:1906.08586, 2019.

R. S. Sutton and A. G. Barto. Introduction to reinforcement learning, volume 135. MIT press Cambridge, 1998.

J. Walsh and R. Roberts. Pumping insulin: everything you need for success on a smart insulin pump. Torrey Pines Press, 2006.

M. E. Wilinska, L. J. Chassin, C. L. Acerini, J. M. Allen, D. B. Dunger, and R. Hovorka. Simulation environment to evaluate closed-loop insulin delivery systems in type 1 diabetes. Journal of diabetes science and technology, 4(1):132–144, 2010.

R. J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229–256, 1992.

S. Yasini, M. Naghibi-Sistani, and A. Karimpour. Agent-based simulation for blood glucose control in diabetic patients. International Journal of Applied Science, Engineering and Technology, 5(1):40–49, 2009.

H. Zou, T. Ren, D. Yan, H. Su, and J. Zhu. Reward shaping via meta-learning. arXiv preprint arXiv:1901.09330, 2019.