摘要
Learning from interaction is the primary way that biological agents acquire knowledge about their environment and themselves.Modern deep reinforcement learning(DRL)explores a computational approach to learning from interaction and has made significant progress in solving various tasks.However,despite its power,DRL still falls short of biological agents in terms of energy efficiency.Although the underlying mechanisms are not fully understood,we believe that the integration of spiking communication between neurons and biologically-plausible synaptic plasticity plays a prominent role in achieving greater energy efficiency.Following this biological intuition,we optimized a spiking policy network(SPN)using a genetic algorithm as an energy-efficient alternative to DRL.Our SPN mimics the sensorimotor neuron pathway of insects and communicates through event-based spikes.Inspired by biological research showing that the brain forms memories by creating new synaptic connections and rewiring these connections based on new experiences,we tuned the synaptic connections instead of weights in the SPN to solve given tasks.Experimental results on several robotic control tasks demonstrate that our method can achieve the same level of performance as mainstream DRL methods while exhibiting significantly higher energy efficiency.
基金
supported by the Beijing Nova Program,China(No.20230484369)
the Strategic Priority Research Program of Chinese Academy of Sciences,China(No.XDA27010404)
the Shanghai Municipal Science and Technology Major Project,China(No.2021SHZDZX),the Youth Innovation Promotion Association of the Chinese Academy of Sciences,China.
作者简介
Duzhen Zhang received the B.Sc.degree in software engineering from Shandong University,China in 2019.He is a Ph.D.degree candidate in both the Institute of Automation,Chinese Academy of Sciences and the University of Chinese Academy of Sciences,China.His research interests include theoretical research on reinforcement learning,natural language processing,and spiking neural networks.E-mail:zhangduzhen2019@ia.ac.cn,ORCID iD:0000-0002-4280-431X;Corresponding author:Tielin Zhang received the Ph.D.degree in brain-inspired intelligence from the Institute of Automation,Chinese Academy of Sciences,China in 2016.He is an associate professor in the Research Center for Brain-inspired Intelligence,Institute of Automation,Chinese Academy of Sciences,China.His research interests include theoretical research on neural dynamics and spiking neural networks.E-mail:tielin.zhang@ia.ac.cn,ORCID iD:0000-0002-5111-9891;Shuncheng Jia is a Ph.D.degree candidate in both the Institute of Automation,Chinese Academy of Sciences and the University of Chinese Academy of Sciences,China.His research interests include theoretical research on neural dynamics,auditory signal processing,and spiking neural networks,E-mail:jiashuncheng2020@ia.ac.cn;Qingyu Wang is a master student in both the Institute of Automation,Chinese Academy of Sciences and the University of Chinese Academy of Sciences,China.His research interests include theoretical research on neural dynamics and spiking neural networks.E-mail:wangqingyu2022@ia.ac.cn;Corresponding author:Bo Xu received the B.Sc.degree in electrical engineering from Zhejiang University,China in 1988,and the M.Sc.and Ph.D.degrees in pattern recognition and intelligent system from the Institute of Automation,Chinese Academy of Sciences,China in 1992 and 1997,respectively.He is a professor,the director of the Institute of Automation,Chinese Academy of Sciences,and also deputy director of the Center for Excellence in Brain Science and Intelligence Technology,Chinese Academy of Sciences,China.His research interests include brain-inspired intelligence,brain-inspired cognitive models,natural language processing and understanding,brain-inspired robotics.E-mail:xubo@ia.ac.cn,ORCID iD:0000-0002-1111-1529。