摘要
本文针对出行需求演化复杂性,首先将票价的优化过程视为智能体在复杂环境中经过不断探索获得最优价格的学习过程;其次引入深度强化学习算法,采用价值函数神经网络拟合出行需求(环境)对票价制定(动作)的反应函数,在不同运输方式间的博弈过程中通过对票价调节动作的奖惩训练其达到决策目标;然后在群体出行决策复杂性刻画方面,基于Logit模型、累积前景理论及Bush Mosteller模型,设计了3种由简单到复杂的出行需求演化场景;最后以现实场景下地铁和公交之间的票价博弈为例,通过数值模拟考察方法的有效性。研究发现:(1)深度强化学习算法在感知出行需求演化复杂性过程中具有良好的票价弹性刻画能力;(2)深度强化学习算法能够针对复杂出行需求给出合理稳定的价格方案,优化地铁(公交)票价后,使不同出行需求模型下地铁(公交)的利润及各出行方式的总体利润均得到显著增长。
Fare setting is an important method of regulating passenger flow and relieving congestion in urban transportation systems.In existing fare optimization research,travel demand is mostly described as completely rational under the theory of general equilibrium.Under the condition of complex group decision-making,the relationship between travel demand and ticket fares is transformed into multi-dimensional complex nonlinear feedback.Therefore,the simplification of a traveler group′s complex decision-making process will lead to the deviation of the effectiveness of fare setting.Focusing on the complexity of travel demand evolution,the optimization of fare price is regarded as an agent′s learning process of obtaining the optimal price through continuous exploration in a complex environment.The value function neural network of deep reinforcement learning is introduced to fit the response function between travel demand(environment)and fare setting(action).In the game process among different travel modes,the decision objective can be achieved by training the fare adjustment actions with rewards and punishments.In terms of group decision complexity,three evolution scenarios of travel demand from simple to complex are separately established based on the traditional logit model,cumulative prospect theory,and the Bush-Mosteller model.Numerical simulation is conducted based on the scenario of the fare game between a subway and bus to verify the validity of the new methodology.In the first part of this study,in terms of the fare decision-making,the traditional bi-level programming method is improved,the travel modes are divided into objective travel mode and other travel modes,and the representative Deep Q Learning methodology(DQN)is introduced to optimize the ticket fare of the objective travel mode.In the new model,the fare adjustment strategy of the objective travel mode between origin-destination pair is set as the action variable of Deep Q Learning,the evolutionary passenger flow of different travel modes after fare adjustment and the ticket fares for different travel modes are set as the state variable of Deep Q Learning,the operating profit of the objective travel mode is set as the reward value of Deep Q Learning,and the penalty mechanism is designed when the ticket fare exceeds the constraint.Moreover,the value function neural network with strong ability of nonlinear characterization is constructed to fit the response function between travel demand(environment)and fare setting(action).The state variables and fare adjustment strategy are set as the neural network input and,after iterative training,the optimal fare adjustment results for each passenger flow state can be obtained.In the second part of this study,from the aspect of travelers′behavior description,a small world network is introduced to depict the social interaction environment of travelers based on the model hypothesis and existing research.The multi-agent bounded rational model with different interaction mechanisms is combined,separately,with the traditional logit model,the logit model based on cumulative prospect theory,and the Bush-Mosteller model with group reinforcement learning.Three travel demand evolution models from simple to complex are designed to describe the complexity change of travel demand(fare decision environment).In the third part of this study,the section from Dongsi to Tongzhou Beiguan of Metro Line 6 connecting the main urban area of Beijing and Tongzhou sub-center in China is taken as the practical application scenario.Subway and ground bus are separately taken as the objective travel modes,Deep Q Learning is introduced to optimize the ticket fare of the objective travel mode,and the fare of another travel mode is optimized by the traditional bi-level programming model.The corresponding results are obtained under three travel demand evolution models from simple to complex,the changes of the optimal ticket fare under different travel demand evolution conditions are investigated,and the optimization results of different models are compared and analyzed.The introduction of Deep Q Learning makes the optimal fare and travel demand of different travel modes show significant differences.In summary,the study found the following.(1)Under the effect of Deep Q Learning,different evolution mechanisms of travel demand correspond to the fare calculation results with significant differences,while the fare calculation results based on traditional bi-level programming are similar under different evolution mechanisms of travel demand.It can be seen that the DQN in which a neural network is used as the travel demand perception tool has more detailed description ability of fare elasticity,and Deep Q Learning shows a better effect in dealing with complex travel demand.(2)With the introduction of Deep Q Learning in the optimization of subway(ground bus)fares,the profit of subway(ground bus)and the overall profit increased significantly,and a reasonable and stable fare scheme is obtained by Deep Q Learning.(3)According to the calculation results of the example in this paper,it can be seen that the actual price is closer to the ticket price obtained by DQN in which the bus is taken as the objective travel mode.This shows that there is a large degree of subsidy for subway ticket fares at present,and the appropriate government department can put forward the guiding price of subway ticket fares according to the budget of financial subsidy.
作者
李雪岩
张汉坤
李静
邱荷婷
LI Xueyan;ZHANG Hankun;LI Jing;QIU Heting(School of management,Beijing union university,Beijing 100101,China;School of e-commerce and logistics,Beijing Technology and Business University,Beijing 100048,China;School of Economics and Management,Beijing Jiaotong University,Beijing 100044,China;School of Management and Mngineering,Capital University of Economics and Business,Beijing 100070,China)
出处
《管理工程学报》
CSSCI
CSCD
北大核心
2022年第6期144-155,共12页
Journal of Industrial Engineering and Engineering Management
基金
国家自然科学基金资助青年项目(72103019)
教育部人文社会科学研究青年基金资助项目(20YJC630069)
北京联合大学智慧北京关键技术攻关学课群项目(ZB10202002)。
关键词
深度强化学习
公共交通
票价制定
群体决策
Deep reinforcement learning
Public transportation
Fare setting
Group decision
作者简介
通讯作者:李雪岩(1987-),男,内蒙古呼和浩特人,北京联合大学管理学院讲师,博士,研究方向:管理科学与复杂系统决策理论、计算经济学。