Natural Language-Guided Reinforcement Learning for Human-Machine Collaboration in Sparse Reward Environments
iacs CAI

Computing and Algorithm Insight

Computing and Algorithm Insight is a peer-reviewed journal publishing research in artificial intelligence, soft...

Publishing Model

Open Access
This journal published by Integra Academic Press

Abstract

Game agents in open environments often struggle with traditional exploration due to sparse environmental rewards. This study adopts deep reinforcement learning to enhance agent decision-making in reward-deficient electronic game settings. We developed a human-machine collaboration model that utilizes natural language instructions to guide the reinforcement learning process through reward construction. To address the sparse feedback problem, Hindsight Experience Replay (HER) was integrated into the architecture. Experimental results show that the natural language reward model achieved a 92% prediction accuracy and a game score of 9.8. Following HER optimization, target instruction accuracy reached 97.8% with a final score of 9.9. These findings demonstrate that combining linguistic guidance with experience replay significantly improves application performance in coefficient reward environments.

Keywords: Human-Machine Collaboration Reinforcement Learning Natural Language Instructions Sparse Rewards Hindsight Experience Replay


References

Y. Zhao and C. H. Liu, “Social-aware incentive mechanism for vehicular crowdsensing by deep reinforcement learning,” IEEE Trans. Intell. Transp. Syst., vol. 22, no. 4, pp. 2314–2325, Apr. 2021, doi: 10.1109/TITS.2020.3014263.

Y. Al-Eryani, M. Akrout, and E. Hossain, “Multiple access in cell-free networks: Outage performance, dynamic clustering, and deep reinforcement learning-based design,” IEEE J. Sel. Areas Commun., vol. 39, no. 4, pp. 1028–1042, Apr. 2021, doi: 10.1109/JSAC.2020.3018825.

M. Zhu, J. Gu, T. Shen, C. Shi, and X. Ren, “Energy-efficient and QoS guaranteed BBU aggregation in CRAN based on heuristic-assisted deep reinforcement learning,” J. Lightw. Technol., vol. 40, no. 3, pp. 575–587, Feb. 2022, doi: 10.1109/JLT.2021.3120874.

P. Tiwari, H. Zhu, and H. M. Pandey, “DAPath: Distance-aware knowledge graph reasoning based on deep reinforcement learning,” Neural Netw., vol. 135, pp. 1–12, Mar. 2021, doi: 10.1016/j.neunet.2020.11.012.

T. Wang, X. Luo, and W. Zhao, “Improving the performance of tasks offloading for Internet of Vehicles via deep reinforcement learning methods,” IET Commun., vol. 16, no. 10, pp. 1230–1240, Jun. 2022, doi: 10.1049/cmu2.12334.

Y. Fu et al., “The distributed economic dispatch of smart grid based on deep reinforcement learning,” IET Gener. Transmiss. Distrib., vol. 15, no. 18, pp. 2645–2658, May 2021, doi: 10.1049/gtd2.12206.

C. Huang, R. Mo, and C. Yuen, “Reconfigurable intelligent surface assisted multiuser MISO systems exploiting deep reinforcement learning,” IEEE J. Sel. Areas Commun., vol. 38, no. 8, pp. 1839–1850, Aug. 2020, doi: 10.1109/JSAC.2020.3000835.

Y. L. Yuan et al., “Hierarchical dynamic movement primitive for the smooth movement of robots based on deep reinforcement learning,” Appl. Intell., vol. 53, no. 2, pp. 1417–1434, Jun. 2023, doi: 10.1007/s10489-022-03219-7.

Z. Zhang, F. He, and E. Oki, “Dynamic VNF scheduling: A deep reinforcement learning approach,” IEICE Trans. Commun., vol. E106.B, no. 7, pp. 557–570, Jul. 2023, doi: 10.1587/transcom.2022ebp3160.

M. Wu and S. Ma, “Robust semiparametric gene-environment interaction analysis using sparse boosting,” Stat. Med., vol. 38, no. 23, pp. 4625–4641, Jul. 2019, doi: 10.1002/sim.8322.

A. Paccagnella et al., “The strong correlation between post-starburst fraction and environment,” Mon. Not. R. Astron. Soc., vol. 482, no. 1, pp. 881–894, Jan. 2019, doi: 10.1093/mnras/sty2728.

S. M. Carpentier et al., “Complexity matching: Brain signals mirror environment information patterns during music listening and reward,” J. Cogn. Neurosci., vol. 32, no. 4, pp. 734–745, Apr. 2020, doi: 10.1162/jocn_a_01508.

R. Gayduk and S. Nadtochiy, “Control-stopping games for market microstructure and beyond,” Math. Oper. Res., vol. 45, no. 4, pp. 1289–1317, Nov. 2020, doi: 10.1287/moor.2019.1033.

M. Skov et al., “Cystic fibrosis newborn screening in Denmark: Experience from the first 2 years,” Pediatr. Pulmonol., vol. 55, no. 2, pp. 549–555, Feb. 2020, doi: 10.1002/ppul.24564.

P. A. Ejegwa and J. M. Agbetayo, “Similarity-distance decision-making technique and its applications via intuitionistic fuzzy pairs,” J. Comput. Cogn. Eng., vol. 2, no. 1, pp. 68–74, Jan. 2023, doi: 10.47852/bonviewJCCE512522514.

A. Tsantekidis, N. Passalis, and A. Tefas, “Diversity-driven knowledge distillation for financial trading using deep reinforcement learning,” Neural Netw., vol. 140, pp. 193–202, Aug. 2021, doi: 10.1016/j.neunet.2021.02.026.

J. Li, T. Yu, and B. Yang, “Coordinated control of gas supply system in PEMFC based on multi-agent deep reinforcement learning,” Int. J. Hydrogen Energy, vol. 46, no. 68, pp. 33899–33914, Oct. 2021, doi: 10.1016/j.ijhydene.2021.07.009.

P. Wu, J. Partridge, E. Anderlini, Y. Liu, and R. Bucknall, “Near-optimal energy management for plug-in hybrid fuel cell and battery propulsion using deep reinforcement learning,” Int. J. Hydrogen Energy, vol. 46, no. 80, pp. 40022–40040, Nov. 2021, doi: 10.1016/j.ijhydene.2021.09.196.

E. Paulussen, A. Decloedt, L. Vera, L. Lefere, and G. van Loon, “Unilateral jugular vein stenosis in five horses and experience with percutaneous transluminal angioplasty,” Equine Vet. J., vol. 54, no. 4, pp. 710–718, Jul. 2022, doi: 10.1111/evj.13506.

F. Geng, M. Botdorf, and T. Riggins, “How behavior shapes the brain and the brain shapes behavior: Insights from memory development,” J. Neurosci., vol. 41, no. 5, pp. 981–990, Feb. 2021, doi: 10.1523/jneurosci.2611-19.2020.