Upgrade to Pro — share decks privately, control downloads, hide ads and more …

IDRiM2022_EvacuationLearningModel

SatokiMasuda
November 05, 2024

 IDRiM2022_EvacuationLearningModel

Presentation slides for the 12th International Conference of the International Society for the Integrated Disaster Risk Management (IDRiM), entitled "Evacuation choice modeling using reinforcement learning based on the multi-armed bandit problem."

SatokiMasuda

November 05, 2024
Tweet

More Decks by SatokiMasuda

Other Decks in Research

Transcript

  1. Evacuation choice modeling using reinforcement learning based on the multi-armed

    bandit problem Satoki MASUDA, Eiji HATO Department of Civil Engineering, The University of Tokyo [email protected] At IDRiM2022, Thu. 22 September., Young Scientist Session Group I
  2. Motivation • Various types of disaster information from various sources

    2 • These information sometimes triggers evacuation, but sometimes does not. SNS TV news neighbor website
  3. Motivation • In addition, response to information varies from person

    to person. 3 • If we model information learning process, we can predict, encourage, and control evacuation behavior by information. A B C Custom-made evacuation learning system ? ? ? ? that optimizes - contents of information - distribution sources of information
  4. Objective of research 4 1. Developing the evacuation behavioral model

    that incorporates information learning process 2. Representing the heterogeneity of reaction to disaster information
  5. Previous research on evacuation modeling 5 • One-shot questionnaire →

    Discrete choice model time to disaster 48h 6h 24h 12h home shelter A shelter B time to disaster 48h 6h 24h 12h home shelter A shelter B Pr!"#$ 𝜽 = logit(𝑎𝑔𝑒, ℎ𝑎𝑧𝑎𝑟𝑑, 𝑒𝑣𝑎𝑐𝑢𝑎𝑡𝑖𝑜𝑛 𝑜𝑟𝑑𝑒𝑟, 𝑒𝑡𝑐; 𝜽) Dynamics of learning and oblivion process is not included
  6. Experiment design 6 evacuation drill hazard map time to disaster

    48h 6h 24h 12h time to disaster 48h 6h 24h 12h time to disaster 48h 6h 24h 12h time to disaster 48h 6h 24h 12h home shelter A shelter B congestion info. Questionnaire① Questionnaire ④ Questionnaire ② Questionnaire ③ wave1 wave2 wave3 wave4 repeat the observation of evacuation choices and information provision - focus on change within each person
  7. Experiment in Koto, Tokyo 7 inundation area in the case

    of embankment along Arakawa river collapsing https://www.kcf.or.jp/nakagawa/kikaku/detail/?id=70 Past flood
  8. Experiment design 8 Questionnaire① Questionnaire ④ Questionnaire ② Questionnaire ③

    drill congestion info. with drill = 110 w/o drill = 142 drill drill drill congestion info. + hazard map dropouts = 20 wave1 wave2 wave3 wave4 congestion info. congestion info. + hazard map March 2nd ~ 4th, 2022 March 11th ~ 15th, 2022 March 25th ~ 29th, 2022 April 14th ~ 20th, 2022
  9. Experiment design – evacuation drill 9 ①Ojima-4 public housing 1

    2 3 ③Toyosu evacuation building ②Toyosu-4 public housing hazard map videos of past disasters sensing the disaster situation observing the facility
  10. Modelling approach – two dynamics 10 drill hazard map time

    to disaster 48h 6h 24h 12h time to disaster 48h 6h 24h 12h time to disaster 48h 6h 24h 12h time to disaster 48h 6h 24h 12h home shelter A shelter B congestion info. Questionnaire① Questionnaire ④ Questionnaire ② Questionnaire ③ wave1 wave2 wave3 wave4 intra-wave dynamic evacuation choice model inter-wave dynamic learning model
  11. Modelling approach – intra-wave 11 𝑝 𝑠%&' 𝑠% = 𝑒

    ' ( ) 𝑠%&' 𝑠% ; 𝜽 &*+! ,"#$ ∑ ,"#$ % ∈. ," 𝑒 ' ( ) 𝑠%&' / 𝑠% ; 𝜽 &*+! ,"#$ % Departure time choice and destination choice model → Dynamic discrete choice model time to disaster 48 hours 6 hours 24 hours 12 hours home evacuation site A evacuation site B !! "" !# "" !$%&' "" # "()* |"( utility of departure time choice utility of destination choice Questionnaire① wave1
  12. Modelling approach – inter-wave 12 𝑣0 𝑠%&' 𝑠% ; ?

    𝜽, ? 𝜶, 𝝀 = 𝑣' 𝑠%&' 𝑠% ; ? 𝜽, ? 𝜶 + C 0%12 0 𝛾030% 𝑔0% 𝑠%&' |𝑠% ; 𝝀 Dynamic learning model → Reinforcement learning utility of wave 𝑘 memory rate utility of newly acquired information at wave 𝑘 drill hazard map time to disaster 48h 6h 24h 12h time to disaster 48h 6h 24h 12h time to disaster 48h 6h 24h 12h time to disaster 48h 6h 24h 12h home shelter A shelter B congestion info. Questionnaire① Questionnaire ④ Questionnaire ② Questionnaire ③ wave1 wave2 wave3 wave4 - 𝛾 = how much people forget previous information - 𝝀 = how much people explore or exploit information = Reward
  13. Estimation result of those who chose “evacuate” at wave 1

    13 𝝀 𝛾 The same information has opposite effect. Heterogeneity of response to information is shown. Estimation using EM algorithm shows the same information can have opposite effect on different group of people.
  14. Conclusion 14 ü We conducted a repeated experiment to observe

    relation between information learning and behavioral change. ü We modeled dynamics of information learning process using the idea of reinforcement learning. ü Parameter estimation using EM algorithm shows the heterogeneity of learning process. ü The dynamic learning model will lead to predicting, encouraging, and controlling evacuation behavior by information. ★Point to discuss - What kind of application can we design if we understand and predict effect of DRR education on each person?
  15. Estimation result of those who chose “not evacuate” at wave

    1 18 all data estimates t-value Change in departure time choice utility congestion -12.820 -0.03 Chage in destination choice utility whether the destintaion is in hazard map -0.222 -0.44 participate evacuation drill (home alternative) 1.693 3.18** whether the destination is congested (non-home alternatives) -2.676 -4.85** memory rate 0.928 5.16** number of samples 144 intial log-likelihood -290.0 final log-likelihood -215.9 likelihood ratio 0.255 adjusted likelihood ratio 0.238 *:significant at 5%, **significant at 1%
  16. Multi-armed bandit problem 19 current success rate 20% (1/5) current

    success rate 60% (3/5) current success rate 0% (0/1) current success rate 20% (1/5) current success rate 60% (3/5) current success rate 50% (1/2) wave k wave k+1 choose this arm and success next choice? A gambler tries to maximize the sum of rewards earned through a sequence of lever pulls. There is a trade-off between - "exploitation" of the machine that has the highest expected payoff - "exploration" to get more information about the expected payoffs of the other machines
  17. Multi-armed bandit problem • A fixed limited set of resources

    (trial times) must be allocated between competing choices (slot machines) in a way that maximizes their expected gain (money), when each choice's properties are only partially known at the time of allocation, and may become better understood as time passes or by allocating resources to the choice. • In evacuation learning, - trial times = learning cost - slot machines = evacuation choices (departure time, destination…) - money = utility 20
  18. Multi-armed bandit problem 21 information provision next choice? An evacuee

    tries to maximize the sum of utility earned through a sequence of evacuation choices. There is a trade-off between - "exploitation" of the choices that has the highest expected payoff - "exploration" to get more information about the expected payoffs of the other choices current utility 20 current utility 60 current utility 0 wave k current utility 20 current utility 60 current utility 50 wave k+1
  19. Multi-armed bandit problem • Multi-armed bandit problem exploring best policy

    under the partial knowledge about reward • Our study exploring reward system under the policy of utility maximization (inverse reinforcement learning) 22
  20. Modelling approach – inter-wave 23 𝑣0 𝑠%&' 𝑠% ; ?

    𝜽, ? 𝜶, 𝝀 = 𝑣' 𝑠%&' 𝑠% ; ? 𝜽, ? 𝜶 + C 0%12 0 𝛾030% 𝑔0% 𝑠%&' |𝑠% ; 𝝀 • Update utility function (Reward) 𝑣0 𝑠%&' 𝑠% ; ? 𝜽, ? 𝜶, 𝝀 : utility of wave k 𝛾030% : memory rate 𝑔0% 𝑠%&' |𝑠% ; 𝝀 : utility of newly acquired information at wave k • risk of residence area • socio-demographic variables • congestion • hazard map • The utility of evacuation choice is updated when one gets disaster information • But the information is forgotten by the memory rate 𝜸