Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Machine Learning and Stochastic Geometry: Stati...

Machine Learning and Stochastic Geometry: Statistical Frameworks Against Uncertainty in Wireless LANs --- Part A

Tutorial at IEEE ICC 2019

Koji Yamamoto

May 24, 2019
Tweet

More Decks by Koji Yamamoto

Other Decks in Technology

Transcript

  1. Machine Learning and Stochastic Geometry: Statistical Frameworks Against Uncertainty in

    Wireless LANs Koji Yamamoto and Takayuki Nishio Graduate School of Informatics, Kyoto University 2019-05-24 — IEEE ICC 2019 Tutorial
  2. Contents of this tutorial Part A — Deep Reinforcement Learning

    and Stochastic Geometry for microwave WLANs Koji Yamamoto — 90 min I. Wireless LANs — IEEE 802.11ax and 11be II. Deep Reinforcement Learning — Channel Allocation in WLANs III. Stochastic Geometry — Modeling and Analysis of WLANs Break Part B — Deep Learning for/in mmWave WLANs Takayuki Nishio — 90 min 2 / 88
  3. Radio parameters Transmission power Frequency channel and states topology fading

    − − − − − − − − → Communication quality Throughput Delay x Model f − − − − − − − − → y = f(x) We would like to ▶ maximize the communication quality f(x) ▶ find the optimal parameter arg max x f(x) 3 / 88
  4. Model f is given? Optimization/ Game theory yes Data (xi,

    yi) is given? yi = f(xi) + noise no Supervised learning yes Reinforcement learning no Deep learning (CNN) xi is image Deep RL xi is image Part A-III Part A-II Part B 4 / 88
  5. Model f is given? Optimization/ Game theory yes Data (xi,

    yi) is given? yi = f(xi) + noise no Supervised learning yes Reinforcement learning no Deep learning (CNN) xi is image Deep RL xi is image Part A-III Part A-II Part B 4 / 88
  6. Model f is given? Optimization/ Game theory yes Data (xi,

    yi) is given? yi = f(xi) + noise no Supervised learning yes Reinforcement learning no Deep learning (CNN) xi is image Deep RL xi is image Part A-III Part A-II Part B 4 / 88
  7. Model f is given? Optimization/ Game theory yes Data (xi,

    yi) is given? yi = f(xi) + noise no Supervised learning yes Reinforcement learning no Deep learning (CNN) xi is image Deep RL xi is image Part A-III Part A-II Part B 4 / 88
  8. Model f is given? Optimization/ Game theory yes Data (xi,

    yi) is given? yi = f(xi) + noise no Supervised learning yes Reinforcement learning no Deep learning (CNN) xi is image Deep RL xi is image Part A-III Part A-II Part B 4 / 88
  9. Goal of this tutorial Deep understanding of important ideas ▶

    Each of ML, RL, SG, and WLANs can be one tutorial ▶ A deep understanding of important techniques helps you to understand the relevant techniques yourself Machine learning and stochastic geometry analysis ▶ Mainly for resource management and system design 5 / 88
  10. Maximum data rate in IEEE 802.11 11 11a 11b 11g

    11n 11ac 11ax 11be (bit/s) 1995 2000 2005 2010 2015 2020 1M 10M 100M 1G 10G 7 / 88
  11. PHY (cited from [Khorov+2019, Table II]) Legacy 802.11ax OFDM Constellation

    Order 256-QAM (11ac) 1024-QAM OFDM Symbol Duration 3.2 µs 12.8µs OFDM Guard Interval 0.4 or 0.8 µs 0.8, 1.6 or 3.2 µs Maximum Data Rate ≈ 7 Gbit/s ≈ 9.6 Gbit/s 8 / 88
  12. 0 0.2 0.4 0.6 0.8 1 1 10 100 1000

    10000 Probability of collisions Number of transmitters Bianchi model (saturated traffic) [Bianchi2000] IEEE 802.11a, Frame size: 1500 B, Data rate: 6 Mbit/s 11 / 88
  13. Channel Access and OBSS Management (cited from [Khorov+2019, Table II])

    Legacy (802.11ac and earlier) 802.11ax Basic Channel Access CSMA/CA OFDMA on top of CSMA/CA Random Channel Access DCF, EDCA UL OFDMA Random Access on top of CSMA/CA Contention-free Access PCF, HCCA (not implemented in real devices), RAW (11ah) Trigger-based UL OFDMA MU Technology MU-MIMO (11ac) MU-MIMO, OFDMA MU Transmission Direction DL (11ac) DL and UL Fragmentation Static Flexible Interference Mitigation NAV, RTS/CTS, HCCA TXOP Negotiation Two NAVs, Quiet Period Spatial Reuse Sectorization (11ah) Adaptive Power and Sensitiv- ity Thresholds, Color 12 / 88
  14. Spatial Reuse Using BSS “Color” Start Channel Transmit Idle End

    Busy (11ac) “Color” < OBSS PD Busy (11ax) Same BSS OBSS Yes No ▶ “Color” header ▶ Parameter: Carrier sense threshold for OBSS (OBSS PD) ▶ As a result, channel is spatially reused (spatial reuse). OBSS: Overlapped Basic Service Set, i.e., the other adjacent cells 13 / 88
  15. IEEE 802.11ax IEEE 802.11be Will be finalized in 2020 (in

    schedule) Starts in 2018 High Efficiency WLAN (HEW) Study Group Extremely High Throughput (EHT) Study Group Wi-Fi 6 Wi-Fi 7 (?) http://www.ieee802.org/11/ Reports/tgax_update.htm http://www.ieee802.org/11/ Reports/ehtsg_update.htm [Khorov+2019] E. Khorov, A. Kiryanov, A. Lyakhov, and G. Bianchi, “A tutorial on IEEE 802.11ax high efficiency WLANs,” IEEE Commun. Surveys Tuts., vol. 21, no. 1, 2019 ▶ At least 30 Gbit/s ▶ 320 MHz bandwidth and non-contiguous spectrum ▶ Multi-band/multi-channel aggregation ▶ 16 spatial streams MIMO ▶ Multi-Access Point Coordination ▶ Enhanced link adaptation and retransmission protocol 14 / 88
  16. Throughput Starvation B A C D E C D E

    t Busy First order starvation [Mhatre+2007] Flow-in-the middle starvation [Garetto+2008] Second order starvation [Mhatre+2007] Under saturated traffic, A and D always detect signals thus they cannot start transmissions 15 / 88
  17. Goal of performance modeling ▶ Points represent TXs or TX-RX

    pairs. ▶ Circle represents carrier sense area (contention domain). We would like to evaluate the following performances without Monte Carlo simulations to ensure reproducibility 1. Individual throughput in a specific topology — Graph theory Part I 2. Average throughput over possible topologies — SG Part III 3. Distributions of interference and SIR — SG Part III 16 / 88
  18. Contention graph TX-RX pair Vertex Pairs detecting signals each other

    Vertices are connected with edge Implicit assumption: TX and RX forming a given link have the same carrier sense relationship (The situation where one of them detects the carrier but the other does not detect it is not allowed) Carrier detection Carrier sensing relationship Contention graph 17 / 88
  19. Back-of-Envelope (BoE) throughput [Liew+2010] Contention graph 0.5 0.5 0 1

    BoE throughput Normalized throughput Medium access probability 18 / 88
  20. Model of simultaneous transmissions according to CSMA Maximum independent sets

    (MISs) ▶ Induced subgraph without edges with the maximum number of vertices ▶ Possible patterns of simultaneous TXs according to CSMA (Vertex-)induced subgraph ▶ A subset of the vertex set of the original graph and related edge set Induced subgraph o o o o MIS x x o o Wolfram—Alpha: Subgraph, Independent Set 19 / 88
  21. n1 = 2 n1/n = 2/2 = 1 n2 =

    0 n2/n = 0/2 = 0 n3 = 1 n3/n = 1/2 = 0.5 1. Find all MISs of the contention graph (possible simultaneous transmission patterns) 2. The normalized throughput of link (vertex) i is determined as ni/n, where ▶ n is the number of MISs and ▶ ni is the number of MISs containing i. We assume that all possible patterns occur with equal probability 20 / 88
  22. Example of BoE throughput 0.6 0.0 1.0 0.4 0.2 1.0

    0.3 0.2 0.8 0.3 1.0 0.6 1.0 0.4 0.3 0.8 21 / 88
  23. 1 require("igraph") 2 require("spatstat") 3 require("plotrix") 4 set.seed(9) 5 6

    LAMBDA <- 2e-5 7 RADIUS <- 200 8 W.LENGTH <- 500 9 W <- owin(xrange = c(-W.LENGTH, W.LENGTH), yrange = c(-W.LENGTH, W.LENGTH)) 10 11 # Poisson point process 12 P <- rpoispp(lambda = LAMBDA, win = W) 13 14 # Adjacency matrix 15 D <- pairdist(P) 16 adjacency.matrix <- (D<RADIUS) 17 diag(adjacency.matrix) <- 0 18 19 # Contention graph 20 adjacency.graph <- graph.adjacency(adjacency.matrix, mode="undirected") 21 22 # Maximum independent sets 23 MISs <- largest_ivs(adjacency.graph) 24 25 # Evaluate throughput 26 ni <- 0 27 for ( i in 1:(P$n) ) { 28 N.MIS <- 0 29 for ( j in 1:length(MISs) ) { 30 N.MIS <- N.MIS + sum(MISs[[j]]==i) 31 } 32 ni[i] <- N.MIS 33 } 34 22 / 88
  24. 35 # Plotting figures 36 quartz(type="pdf", file="160408_BoE_simple_position.pdf") 37 par(pty="s",mar=c(0, 0,

    0, 0)+0.1) 38 plot(P$x, P$y, xlim=c(-W.LENGTH,W.LENGTH), ylim=c(-W.LENGTH,W.LENGTH)) 39 for (i in 1:(P$n) ) { 40 draw.circle(P[i]$x,P[i]$y,200,border=NA,col=adjustcolor( "blue", alpha.f = 0.1)) 41 } 42 dev.off() 43 quartz(type="pdf", file="160408_BoE_simple.pdf") 44 par(pty="s",mar=c(0, 0, 0, 0)+0.1) 45 plot(P$x, P$y, xlim=c(-W.LENGTH,W.LENGTH), ylim=c(-W.LENGTH,W.LENGTH)) 46 for ( i in 1:(P$n-1) ) { 47 for ( j in (i+1):(P$n) ) { 48 if (adjacency.matrix[i,j]==1) { 49 segments(P[i]$x, P[i]$y, P[j]$x, P[j]$y, col="blue", lwd = 2) 50 } 51 } 52 } 53 text(P$x, P$y, sprintf("%.1f",ni/length(MISs)), pos=1) 54 dev.off() 23 / 88
  25. Potential game-based channel allocation [Yin+2017] 0 200 400 600 800

    1000 1200 0 200 400 600 800 1000 1200 Location (m) Location (m) Proposed scheme 0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200 Location (m) Location (m) Compared scheme ▶ Cost function taking into account the number of three-node chains which result in flow-in-the-middle starvation ▶ Cost function is designed to be a potential game [Yamamoto2015], i.e., channel selection dynamics is proved to converge to a Nash equilibrium B. Yin, S. Kamiya, K. Yamamoto, T. Nishio, M. Morikura, and H. Abeysekera, “Mitigating throughput starvation in dense WLANs through potential game-based channel selection,” IEICE Trans. Fundamentals, vol. E100-A, no. 11, Nov. 2017 24 / 88
  26. Summary (Part I) ▶ Spatial reuse using “color” is a

    promising technology in IEEE 802.11ax ▶ The next standard — IEEE 802.11be ▶ Throughput starvation problem ▶ Back-of-envelope throughput evaluation 25 / 88
  27. BoE throughput evaluation takes into account ▶ Adjacency (carrier sensing

    relationship) among stations ▶ Saturated traffic But there are many other factors ▶ Unsaturated traffic and its fluctuations ▶ Random backoff algorithm in CSMA/CA ▶ Fading and its impact on adjacency ▶ Other factors Optimization based on observations ↓ Machine learning 27 / 88
  28. Machine Learn- ing (ML) Unsupervised Learning Supervised Learning Deep Learning

    Reinforcement Learning (RL) Multi- armed Bandit Q-learning Deep RL Multi- agent RL Adversarial RL 28 / 88 Which framework can we utilize for performance improvement in WLANs? From a viewpoint of radio resource management
  29. Reinforcement Learning (RL) Multi-armed Bandit Q-learning Deep RL Multi- agent

    RL Adversarial RL “Deep RL with graph convolution” — My opinion at this point Multi-agent RL is also attractive because WLANs are decentralized networks. However, at this point, ▶ Multi-agent RL is mainly developed for cooperative control systems ▶ It is hard to apply deep learning with recent progress 29 / 88
  30. Deep RL with graph convolution RL Q-learning Deep learning Function

    approximation with deep neural networks Graph convolution 30 / 88
  31. Motivation — Optimal Channel Allocation Optimal channel allocation? Criterion: Aggregated

    throughput Number of channels: 2 1 2 3 4 5 1 2 3 4 5 1 1 1 1 1 1 2 3 4 5 1 1 1 1 1← BoE throughput Framework: Combinatorial optimization (NP-hard) BoE throughput is shown for the ease of explanation. 32 / 88
  32. Motivation — Optimal Channel Allocation Optimal channel allocation? Criterion: Aggregated

    throughput Number of channels: 2 1 2 3 4 5 1 2 3 4 5 1 1 1 1 1 1 2 3 4 5 1 1 1 1 1← BoE throughput Framework: Combinatorial optimization (NP-hard) BoE throughput is shown for the ease of explanation. 32 / 88
  33. Motivation — Optimal Sequence Different problem setting: ▶ Finding the

    minimal sequence ▶ Only one AP can change its channel at a given time ▶ Throughput can be observed only after channel allocation Initial state Optimal states 1 2 3 4 5 1 0 1 0 1 We would like to find the minimum sequence. 1 2 3 4 5 1 1 1 1 1 1 2 3 4 5 1 1 1 1 1 This part is simplified version of [Nakashima+2019] 33 / 88
  34. Channel Allocation Sequence 1 2 3 4 5 1 0

    1 0 1 1 2 3 4 5 1 1 1 0 1 1 2 3 4 5 1 1 1 1 1 ← We want to ac- quire this sequence based on observed throughput Criterion? 1 2 3 4 5 1 0 1 0 1 1 2 3 4 5 0.5 0.5 1 0.5 0.5 1 2 3 4 5 1 1 1 0.5 0.5 1 2 3 4 5 1 1 1 1 1 t = 0 t = 1 t = 2 t = 3 34 / 88
  35. Automated driving system — Self-driving cars https://www.metacar-project.com/ Markov decision process

    (MDP) Agent Environment Action State=Position Reward Metacar: “A reinforcement learning environment for self-driving cars in the browser” 35 / 88
  36. Problem At time t, the agent has a sequence of

    observations ( state S0, A0 action , reward R1, S1), (S1, A1, R2, S2), . . . , (St, At, Rt+1, St+1) Based on the observations, what action At+1 should the agent take to get higher reward? The agent (decision maker) should have the criterion 36 / 88
  37. Total reward and optimal action-value function Criterion of agent: Expected

    total discounted reward with discount factor 0 ≤ γ ≤ 1 E [ ∞ ∑ t=0 γtRt+1 ] By using the optimal action-value function Q∗(s, a) . = max π Eπ [ ∞ ∑ t=0 γtRt+1 S0 = s, A0 = a ] , s ∈ S, a ∈ A The next action should be determined as At+1 = arg max a∈A Q∗(St+1, a) . = denotes “equality relationship that is true by definition” [Sutton+2018] Eπ[·] denotes the expected value of a random variable given that the agent follows policy π. [Sutton+2018] 37 / 88
  38. For the following initial state S0 , the following sequence

    starting with A0 = “AP 4 uses CH 2” is optimal because the sequence after t = 1 is optimal t = 0 t = 1 t = 2 · · · State St 1 2 3 4 5 1 1 1 0 1 1 2 3 4 5 1 1 1 1 1 1 2 3 4 5 1 1 1 1 1 · · · Action At AP 4 uses CH 2 Reward Rt R1 = 5 R2 = 5 · · · Q∗ ( S0 = 1 2 3 4 5 1 1 1 0 1 , A0 = AP 4 uses CH 2 ) = R1 + γR2 + γ2R3 + · · · = 5 + γ5 + γ25 + · · · = 5 1 − γ = 50 (γ = 0.9) 38 / 88
  39. Part of Optimal Q-table For the initial state 1 2

    3 4 5 1 1 1 0 1 CH 1 2 AP 1 4 + γ5 + γ25 + · · · = 49 3 + γ4 + γ25 + · · · = 47.1 2 3 + γ4 + γ25 + · · · = 47.1 4 + γ5 + γ25 + · · · = 49 3 4 + γ5 + γ25 + · · · = 49 3 + γ4 + γ25 + · · · = 47.1 4 4 + γ5 + γ25 + · · · = 49 5 + γ5 + γ25 + · · · = 50 5 4 + γ5 + γ25 + · · · = 49 4 + γ4 + γ25 + · · · = 48.1 γ = 0.9 ▶ The maximum total reward is achieved starting with “AP 4 uses CH 2”. ▶ If we have the optimal Q-table, we can take the optimal action. 39 / 88
  40. ▶ The optimal action-value function Q∗ is unknown ▶ We

    need to estimate Q∗ from observations New problem Based on observations, how can we estimate Q∗? ▶ Qt(s, a): Estimate of Q∗(s, a) at time t 40 / 88
  41. (Tabular) Q-learning [Watkins+1992] Updated estimate from observations ↓ Qt+1(St, At)

    . = Estimate at t ↓ Qt(St, At) + αt δt+1(Qt) δt+1(Qt) . = Rt+1 + γ max a′∈A Qt(St+1, a′) − Qt(St, At) ▶ (St, At, Rt+1, St+1): Observation ▶ 0 ≤ αt ≤ 1: (Small) non-negative number Before observation After observation Qt(St, At) Rt+1 + γ max a′∈A Qt(St+1, a′) αt = 0 αt = 1 δt+1(Qt) Time-difference error (If Qt = Q∗, δt+1 = 0 from the Bellman equation) 41 / 88
  42. Feature of Q-learning [Szepesv´ ari2010, 3.3.1] “Under the assumption that

    every state-action pair is visited infinitely often, the sequence (Qt; t ≥ 0) converges to Q∗.” 42 / 88
  43. Markov decision process ▶ Agent: Centralized controller of all APs

    ▶ State: Channels and contention graph of APs ▶ Action: Channel selection of one AP ▶ Reward: Throughput CH 1 i Current state CH 1 CH 2 i Next state Action: AP i changes its channel to 2 [Nakashima+2019] 43 / 88
  44. State Space To estimate Q∗(s, a), every state-action pair should

    be visited When the number of states is huge, training is infeasible — State explosion Number of APs Number of channels Number of states 4 2 1024 10 3 2 · 1018 44 / 88
  45. Tabular Q-learning (p. 46) Qt+1(St, At) . = Qt(St, At)

    + αt δt+1(Qt) δt+1(Qt) . = Rt+1 + γ max a′∈A Qt(St+1, a′) − Qt(St, At) Q-learning with function approximation Extension to function approximation with a parameterized function Qθ, θ ∈ Rd θt+1 . = θt + αt δt+1(Qθt ) ∇θ Qθt (St, At) δt+1(Qθt ) . = Rt+1 + γ max a′∈A Qθt (St+1, a′) − Qθt (St, At) Deep reinforcement learning Q-learning with function approximation using deep neural networks 45 / 88
  46. Regression in supervised learning ▶ Model f is not given

    ▶ Samples {xi, yi}i∈{1,...,n} are available, where yi = f(xi) + noise ▶ Estimate the output for test input x ̸∈ {xi}i∈{1,...,n} x − − − − → y ∈ ∈ Rd f − − − − → R Approach ▶ Consider a form of parameterized function fθ to approximate f e.g., linear basis function model: fθ(x) . = b ∑ j=1 θj basis function ϕj(x) = θ⊤ϕ(x) ▶ θ is trained (estimated) using observed samples {xi, yi}i∈{1,...,n} e.g., estimation by minimizing a sum-of-squares error θ . = arg min θ n ∑ i=1 ( fθ(xi) − yi ) 2 . = J(θ) loss function ▶ Estimation for test input x: y . = f θ (x) 46 / 88
  47. Training set n observations (xi, yi)i=1,...,n Learning algorithm θ .

    = arg min θ J(θ) New input x Estimated output f θ (x) Model fθ(·) Training Regression 47 / 88
  48. Models (parameterized functions) fθ(x) Learning algorithms Linear basis function model

    fθ(x) . = b ∑ j=1 θj ϕj(x) = θ⊤ϕ(x) Stochastic gradient descent Neural networks fθ(x) . = b ∑ j=1 αj ϕ(x; βj) ϕ(x; β) . = 1 1 + exp(−x⊤w − γ) θ . = (αj, βj)j, β . = (w⊤, γ)⊤ Deep neural networks (CNN, LSTM) Deep neural networks offer better function approximation capabilities compared to shallow networks Backpropagation 48 / 88
  49. State Design and Feature Extraction [Nakashima+2019] State: Adjacency of APs

    and selected channel Physical expression CH 2 CH 1 2 1 3 4 5 Graph expression 2 1 3 4 5 Tabular expression 5x7 matrix       0 1 1 1 0 1 0 0 0 0 1 0 0 1 0 1 0 1 0 0 0 0 0 0 0       [ 1 0 0 1 1 0 1 1 0 0 ] Ch 1 Ch 2 Feature extraction (pre-processing): Graph convolutional networks Convolutional neural network (CNN) Feature extraction for graphs Feature extraction for images 49 / 88
  50. State: 5x7 matrix input_1: InputLayer input: output: (None, 1, 5,

    7) (None, 1, 5, 7) lambda_2: Lambda input: output: (None, 1, 5, 7) (None, 1, 5, 2) lambda_1: Lambda input: output: (None, 1, 5, 7) (None, 1, 5, 5) spectral_graph_convolution_1: SpectralGraphConvolution input: output: [(None, 1, 5, 2), (None, 1, 5, 5)] (None, 1, 5, 32) spectral_graph_convolution_2: SpectralGraphConvolution input: output: [(None, 1, 5, 32), (None, 1, 5, 5)] (None, 1, 5, 16) batch_normalization_1: BatchNormalization input: output: (None, 1, 5, 32) (None, 1, 5, 32) batch_normalization_2: BatchNormalization input: output: (None, 1, 5, 16) (None, 1, 5, 16) flatten_1: Flatten input: output: (None, 1, 5, 16) (None, 80) dense_1: Dense input: output: (None, 80) (None, 10) Q-values for 10 actions (5 APs x 2 CHs) e 50 / 88
  51. Libraries and model 1 import numpy as np 2 import

    tensorflow as tf 3 from keras.models import Model 4 from keras.layers import Dense, Activation, Flatten, Input, Reshape, Lambda, Concatenate, BatchNormalization → 5 from rl.policy import EpsGreedyQPolicy # keras-rl 6 7 def get_model_functional(env, nb_actions, ap_number, ch_number): 8 inputs = Input(shape=(1,) + env.observation_space.shape) 9 adj = Lambda(lambda x: x[:, :, :, :ap_number], 10 output_shape=(1, ap_number, ap_number))(inputs) # adjacency matrix 11 ch = Lambda(lambda x: x[:, :, :, ap_number:], 12 output_shape=(1, ap_number, ch_number))(inputs) # channel matrix 13 # GraphConvolution is not provided in libraries 14 H = GraphConvolution(units=32, activation='relu', 15 kernel_initializer='he_normal')([ch, adj]) 16 H = BatchNormalization()(H) 17 H = GraphConvolution(units=16, activation='relu', 18 kernel_initializer='he_normal')([H, adj]) 19 H = BatchNormalization()(H) 20 H = Flatten()(H) 21 predictions = Dense(nb_actions, activation='linear', 22 kernel_initializer='he_normal')(H) 23 model = Model(inputs=inputs, outputs=predictions) 24 return model 51 / 88
  52. 0 200000 400000 600000 800000 Step 0.55 0.60 0.65 0.70

    0.75 0.80 Reward ▶ Five APs are located uniformly and randomly ▶ Reward: The minimum individual BoE throughput of five APs Simplified version of [Nakashima+2019] 52 / 88
  53. End-to-End Learning of Proactive Handover Policy for Camera-Assisted mmWave Networks

    Using Deep Reinforcement Learning Yusuke Koda Kota Nakashima Koji Yamamoto Takayuki Nishio Masahiro Morikura Graduate School of Informatics, Kyoto University arXiv:1904.04585
  54. Image-to-decision proactive handover ▶ Determine one BS from two candidate

    BSs ▶ The output of NNs is Q-value for input images NN [16] (Combination of CNN and LSTM) Consequtive input images : Index of associated BS : Remaining time step until handover process is completed : Action Output layer Output of NN prior to output layer 54 / 88
  55. 12.2 12.4 12.6 12.8 13.0 13.2 13.4 13.6 13.8 14.0

    0 50 150 250 Time (s) Data rate in BS 1 (Mbit/s) Data rate degradation Data rate of BS 1 12.2 12.4 12.6 12.8 13.0 13.2 13.4 13.6 13.8 14.0 40 50 60 70 Time (s) Action value Action value for selecting BS 1 Action value for selecting BS 2 Handover to BS 2 Handover to BS 1 Q-value for selecting BSs 1 and 2 without camera images ▶ Because the received power degrades sharply, handover decision is conducted after degradation in general 55 / 88
  56. 12.2 12.4 12.6 12.8 13.0 13.2 13.4 13.6 13.8 14.0

    40 50 60 70 Time (s) Action value Action value for selecting BS 1 Action value for selecting BS 2 Handover to BS 2 Handover to BS 1 Q-value for selecting BSs 1 and 2 without camera images 12.2 12.4 12.6 12.8 13.0 13.2 13.4 13.6 13.8 14.0 10 15 20 25 30 Time (s) Action−value Action value for selecting BS 1 Action value for selecting BS 2 Handover to BS 2 Handover to BS 1 Action value degradation ahead of blockage Q-value for selecting BSs 1 and 2 with camera images ▶ By extending the state space using camera images, proactive handover is enabled 56 / 88
  57. Our related works Deep reinforcement learning with graph convolution ▶

    Channel allocation for WLANs [Nakashima+2019] We utilize ▶ Deep Q network (DQN) [Mnih+2015] ▶ Double Deep Q-network [vHasselt+2016] ▶ Dueling network [Wang+2015] ▶ Prioritized experience replay [Schaul+2015] Deep reinforcement learning with CNN ▶ Image-based proactive handover for mmWave WLANs [Koda+2018], [Koda+2019] 57 / 88
  58. References ▶ R. S. Sutton and A. G. Barto, Reinforcement

    Learning, 2nd ed. MIT Pr., 2018 ▶ C. Szepesv´ ari, Algorithms for Reinforcement Learning. Morgan and Claypool Pub., 2010 ▶ K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath, “Deep reinforcement learning: A brief survey,” IEEE Signal Process. Mag., vol. 34, no. 6, Nov. 2017 58 / 88
  59. Summary (Part II) ▶ Reinforcement learning is used to acquire

    the optimal sequence to maximize the total reward. ▶ Q-learning can estimate the optimal action-value function Q∗. ▶ Deep neural networks are used for function approximation of Q function (deep RL). 59 / 88
  60. Goal of performance modeling ▶ Points represent TXs or TX-RX

    pairs. ▶ Circle represents carrier sense area (contention domain). We would like to evaluate the following performances without Monte Carlo simulations to ensure reproducibility 1. Individual throughput in a specific topology — Graph theory Part 1 2. Average throughput over possible topologies — SG 3. Distributions of interference and SIR — SG 61 / 88
  61. 0.51 0.31 0.43 0.69 0.09 0.23 0.27 0.27 0.62 0.43

    0.65 0.57 0.11 0.60 0.36 0.43 All points Potential TXs Poisson point process (PPP) Φ Numbers Backoff counter Uniform distribution Circles Contention domain Points with disks Transmitting TXs Mat´ ern hardcore point pro- cess (MHCPP) ΦM 62 / 88
  62. Poisson point process (PPP) Φ Φ . = {x} A

    set of locations of random points (point process) Φ(B) A r.v. representing the number of points of Φ in region B, i.e., Φ(B) . = #{Φ ∩ B} Φ is a set of independent random points with homogeneous density λ ⇐⇒ Φ is a PPP with intensity (density) λ ⇐⇒ Φ(B) ∼ Pois(λ|B|), i.e., Poisson distribution with intensity λ|B| ⇐⇒ P ( Φ(B) = n ) = e−λ|B|(λ|B|)n n! , n = 0, 1, . . . λ = 1, area of square grid is 1, total area is 40 63 / 88
  63. Probabilities derived assuming PPP ▶ Probability that Φ has no

    point in B P ( Φ(B) = 0 ) = e−λ|B| ▶ Probability that Φ has at least one point in B P ( Φ(B) ≥ 1 ) = 1 − e−λ|B| Φ is a PPP with intensity λ ⇐⇒ Φ(B) ∼ Pois(λ|B|) ⇐⇒ P ( Φ(B) = n ) = e−λ|B|(λ|B|)n n! , n = 0, 1, . . . 64 / 88
  64. Generating Realizations of PPP Both codes generates the figure on

    p. 68 B . = { (x, y) | 0 ≤ x ≤ 10, 0 ≤ y ≤ 4 }, i.e., |B| = 40, and λ = 1 General way in R or other languages n <- rpois(1, lambda=40) x <- runif(n, min=0,max=10) y <- runif(n, min=0,max=4) plot(x,y) 1. For region B, generating a realization n of Poisson random variable with expectation λ|B| (= lambda in the code) 2. Generating n uniformly distributed points in region B Package “spatstat” [Baddeley+2015] in R P <- rpoispp(lambda=1, win=owin(xrange=c(0,10), yrange=c(0,4))) plot(P) For more complicated point processes, this package is useful. 65 / 88
  65. CSMA 0.51 0.31 0.43 0.69 0.09 0.23 0.27 0.27 0.62

    0.43 0.65 0.57 0.11 0.60 0.36 0.43 Points Potential TXs Poisson point process Φ Numbers Backoff counter m(x) Uniform distributed r.v. ʓ Contention domain b(x, r) Disc with radius r ˔ Contended TXs Φ ∩ b(x, r) ˔ Simultaneous TXs Mat´ ern hardcore point process ΦM → Interference RXs are omitted in figures. 66 / 88
  66. Contention domain ʓ b(x, r) ▶ Disc with radius r

    centered at x ▶ Model of contention domain of x ∈ Φ, i.e., area where the received power is grater than the carrier sense threshold 67 / 88
  67. Contended TXs ˔ Locations of contended TXs with TX x

    Φ ∩ b(x, r) Number of Contended TXs with TX x Φ(b(x, r)) ∼ Pois(λ|b(x, r)|) = Pois(λπr2) 68 / 88
  68. Backoff and simultaneous TXs 0.51 0.31 0.43 0.69 0.09 0.23

    0.27 0.27 0.62 0.43 0.65 0.57 0.11 0.60 0.36 0.43 m(x) ▶ “Mark” ▶ Uniformly distributed r.v. on [0, 1] for each point x ∈ Φ ▶ Model of backoff counter ˔ MHCPP ΦM . = { x ∈ Φ : m(x) < m(y) for all y ∈ Φ ∩ b(x, r) \ {x} } ▶ TXs with smallest backoff counter in their contention domains ▶ Model of a set of simultaneous TXs 69 / 88
  69. Expected normalized throughput (medium access probability) P(x ∈ Φ :

    x ∈ ΦM) = 1−e−λπr2 λπr2 ▶ Probability that a point in PPP retains in MHCPP ▶ Probability that a potential TX has the smallest backoff counter, in other words, no neighboring TXs have smaller backoff counter, in its contention domain, and thus can transmit P(x ∈ Φ : x ∈ ΦM) = P ( #{ z ∈ Φ ∩ b(x, r) | m(z) < m(x) } = 0 ) Conditioning on m(x) = t and considering ˜ Φ . = { z ∈ Φ | m(z) < t } ▶ TXs with backoff counter less than t ▶ Intensity of ˜ Φ is tλ ▶ ˜ Φ(b(x, r)) ∼ Pois(tλ|b(x, r)|) = Pois(tλπr2) P ( #{ z ∈ Φ ∩ b(x, r) | m(z) < t } = 0 ) = P ( ˜ Φ(b(x, r)) = 0 ) = e−tλπr2 P ( #{ z ∈ Φ ∩ b(x, r) | m(z) < m(x) } = 0 ) = ∫ 1 0 e−tλπr2 dt = 1 − e−λπr2 λπr2 [Baccelli+2009, 18] derived medium access probability taking into account fading 70 / 88
  70. Intensity of MHCPP ▶ Intensity of MHCPP λM . =

    Intensity of potential TXs λ · 1 − e−λπr2 λπr2 = 1 − e−λπr2 πr2 ▶ In ultra dense networks, lim λ→∞ λM = 1 πr2 , The intensity of simultaneous transmissions only depends on the area of contention domain. 71 / 88
  71. Analysis of Inversely Proportional Carrier Sense Threshold and Transmission Power

    Setting Koji Yamamoto1 Xuedan Yang1 Takayuki Nishio1 Masahiro Morikura1 Hirantha Abeysekera2 1Graduate School of Informatics, Kyoto University 2NTT Access Network Service Systems Laboratories, NTT Corporation IEEE CCNC 2017, Jan. 2017
  72. Carrier Sense Threshold Setting for Spatial Reuse ▶ ʓ Contention

    domain: RxPower > CST ▶ ˔ Contended TXs ▶ ˔ Simultaneous TXs ▶ → Interference Increasing Carrier Sense Threshold (CST) ▶ ˔ reduces ▶ → increases In densely deployed WLANs, CST adjustment is crucial However, CST setting can cause throughput starvation RXs are omitted in figures. 73 / 88
  73. -82 0 -1 0 1 2 3 4 5 6

    7 8 9 10 11 12 13 14 15 16 17 -90 -80 -70 -60 -50 -40 -30 -20 -10 0 10 20 AP1 AP2 Received power (dBm) Position (m) p1 =13 dBm p2 =13 dBm θ1 =-82 dBm θ2 =-82 dBm ▶ Received signal power > threshold, θx, i.e., two APs detect signals from each other. ▶ Two APs time-share the channel. 74 / 88
  74. -82 0 -1 0 1 2 3 4 5 6

    7 8 9 10 11 12 13 14 15 16 17 -90 -80 -70 -60 -50 -40 -30 -20 -10 0 10 20 AP1 AP2 Received power (dBm) Position (m) p1 =13 dBm p2 =13 dBm θ1 =-82 dBm θ2 =-82 dBm +34 dB ▶ AP2 increases θ2 not to detect signals from AP1, then AP2 would transmit independently of AP1 ▶ AP1 should defer its transmission during transmission of AP2 → Starvation due to asymmetric carrier sensing relationship [Mhatre+2007] 75 / 88
  75. -82 0 -1 0 1 2 3 4 5 6

    7 8 9 10 11 12 13 14 15 16 17 -90 -80 -70 -60 -50 -40 -30 -20 -10 0 10 20 AP1 AP2 Received power (dBm) Position (m) p1 =13 dBm p2 =13 dBm -34 dB θ1 =-82 dBm θ2 =-82 dBm +34 dB ▶ Reduction in transmission power along with increase in CST [Fuemmeler+2006; Mhatre+2007] Inversely proportional setting (IPS) of CST and transmission power ▶ Symmetric carrier sensing relationship is maintained and APs spatially reuse the channel 76 / 88
  76. Numerical example: Expected throughput ▶ Carrier sense range is a

    function of CST and transmission power 0 1 2 3 0 2 4 6 8 10 12 14 16 n=4 n=10 Throughput (bit/s/Hz) ax = θx /Θ (dB) Throughput Asymptotic throughput Approximation SIR = 30 dB α = 3.5 ▶ The optimal value of CST ax assuming approximation is close to that without approximation ▶ [Iwata+2019] extend the work under nearest AP association M. Iwata, K. Yamamoto, B. Yin, T. Nishio, M. Morikura, and H. Abeysekera, “Analysis of inversely proportional carrier sense threshold and transmission power setting based on received power for IEEE 802.11ax,” in Proc. IEEE Consum. Commun. Netw. Conf. (CCNC), Las Vegas, NV, USA, Jan. 2019 77 / 88
  77. Goal of performance modeling ▶ Points represent TXs or TX-RX

    pairs. ▶ Circle represents carrier sense area (contention domain). We would like to evaluate the following performances without Monte Carlo simulations to ensure reproducibility 1. Individual throughput in a specific topology — Graph theory Part 1 2. Average throughput over possible topologies — SG 3. Distributions of interference and SIR — SG 78 / 88
  78. SG Analysis of CSMA ▶ [Nguyen+2007; Baccelli+2009] MHCPP type II

    for CSMA modeling ▶ [Alfano+2011; Alfano+2014] Distribution of throughput ▶ [Busson+2009] Intensity underestimation of MHCPP ▶ [Kaynia+2011] Optimization of carrier sensing threshold ▶ [ElSawy+2012; ElSawy+2013] Modified hard-core point process ▶ [Venkataraman+2006] Interference distribution from PPP outside the contention domain Approximating MHCPP by PPP with intensity λMHCPP [ElSawy+2012] 79 / 88
  79. SIR distribution in Poisson cellular networks [Andrews+2011] ▶ Positions of

    base stations (BSs) follows a Poisson point process (PPP) ▶ Nearest BS association (Poisson-Voronoi cells) ▶ Rayleigh fading ▶ Exponential-decaying path loss with α > 2 Spatial distribution of SIR in downlink can be derived w/o fading P(SIR > θ) = · · · = 1 1 + 2θ α−2 2 F1 ( 1, 1 − 2 α ; 2 − 2 α ; −θ ) = √ θ arctan √ θ (α=4) Closed form when α = 4 0 1 −20 −10 0 10 20 30 P(SIR ≤ θ) θ (dB) −20 −10 0 10 20 30 with fading 80 / 88
  80. Expected sum — Campbell’s theorem for sums [Haenggi2012, §4.2] Let

    Φ be a PPP on R2 with intensity λ, and f : R2 → R be a measurable function. Then, EΦ [ ∑ x∈Φ ex. interference from x to origin ↓ f(x) ex. sum of interference from Φ ] = λ ∫ R2 f(x) dx ▶ Expected sum can be evaluated as an area integral ▶ Only applicable for PPPs. Thus, PPPs are often assumed 81 / 88
  81. Stochastic Geometry Analysis of Normalized SNR Scheduling in Downlink Cellular

    Networks Takuya Ohto† Koji Yamamoto† Seong-Lyun Kim‡ Takayuki Nishio† Masahiro Morikura† †Kyoto University ‡Yonsei University IEEE Wireless Communications Letters, Aug. 2017
  82. 1) Round-robin scheduler A user is selected in rotation independent

    of the channel state. 2) Proportional fair (PF) scheduler A user with the largest instantaneous rate normalized by the short-term aver- age value, arg max i ratei E[ratei | ri ] 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Normalized data rate 3) Normalized SNR scheduler A user with the largest instantaneous SNR normalized by the short-term aver- age value, arg max i SNRi E[SNRi | ri ] = arg max i hi 0 1 2 3 4 5 6 Normalized SNR SNRi . = hi ri −α/σ2 Short-term average SNR: E[SNRi | ri ] = ri −α/σ2 83 / 88
  83. SIR distributions (success prob. / coverage prob.) [Andrews+2011] SIR ccdf

    under Rayleigh fading channel in Poisson cellular networks P(SIR > θ) = 1 1 + 2θ α−2 2 F1 ( 1, 1 − 2 α ; 2 − 2 α ; −θ ) √ θ arctan √ θ (when α = 4) α: path loss exponent User is selected randomly independent of channel state. This performance can be achieved under round-robin scheduling. Normalized SNR scheduling [Ohto+2017] P(SIR > θ) ≈ ∞ ∑ n=0 (λu /cλb )n c B(n + 1, c) (λu /cλb + 1)n+c+1 × n+1 ∑ k=1 ( n+1 k ) (−1)k+1 1 + 2kθ α−2 2 F1 ( 1, 1 − 2 α ; 2 − 2 α ; −kθ ) √ kθ arctan √ kθ (when α = 4) λb : intensity of BSs, λu : intensity of users, c = 3.5 84 / 88
  84. SIR ccdf (Interference-limited, Rayleigh, and α = 4) [Ohto+2017] 0.0

    0.2 0.4 0.6 0.8 1.0 −10 −5 0 5 10 15 20 λu /λb = 4 λu /λb = 8 P(SIR > θ) θ (dB) Normalized SNR (Analysis) (Simulation) Round-robin (Analysis) Round-robin [Andrews+2011] P(SIR > θ) = 1 1 + √ θ arctan √ θ Normalized SNR P(SIR⋆ > θ) ≈ ∞ ∑ n=0 fN (n) n+1 ∑ k=1 ( n+1 k ) (−1)k+1 1 + √ kθ arctan √ kθ 85 / 88
  85. Related works Stochastic geometry analysis of channel-adaptive user scheduling in

    cellular networks Downlink ▶ [Ohto+2017] T. Ohto, K. Yamamoto, S.-L. Kim, T. Nishio, and M. Morikura, “Stochastic geometry analysis of normalized SNR-based scheduling in downlink cellular networks,” IEEE Wireless Commun. Lett., vol. 6, no. 4, Aug. 2017 ▶ [Yamamoto2018] K. Yamamoto, “SIR distribution and scheduling gain of normalized SNR scheduling in Poisson networks,” in Proc. Int. Symp. Model. Optim. Mobile Ad Hoc Wireless Netw. (WiOpt), Shanghai, China, May 2018 Uplink ▶ [Kamiya+2018] S. Kamiya, K. Yamamoto, S.-L. Kim, T. Nishio, and M. Morikura, “Asymptotic analysis of normalized SNR-based scheduling in uplink cellular networks with truncated channel inversion power control,” in Proc. IEEE Int. Conf. Commun. (ICC), Kansas, MO, USA, May 2018 86 / 88
  86. References ▶ M. Haenggi, Stochastic Geometry for Wireless Networks. Cambridge,

    U.K.: Cambridge Univ. Press, 2012 ▶ F. Baccelli and B. Blaszczyszyn, “Stochastic geometry and wireless networks: Volume II applications,” Found. Trends Netw., vol. 4, no. 1-2, 2009 87 / 88
  87. Summary (Part III) Stochastic geometry modeling ▶ Poisson point process

    — Potential TXs ▶ Mat´ ern CSMA point process — Simultaneous TXs ▶ Distribution of interference and SIR Applications ▶ Joint carrier sense threshold and transmission power setting ▶ User scheduling 88 / 88
  88. References I [Alfano+2011] G. Alfano, M. Garetto, and E. Leonardi,

    “New insights into the stochastic geometry analysis of dense CSMA networks,” in Proc. IEEE Int. Conf. Comput. Commun. (INFOCOM), Shanghai, China, Apr. 2011. [Alfano+2014] ——,“New directions into the stochastic geometry analysis of dense CSMA networks,” IEEE Trans. Mobile Comput., vol. 13, no. 2, Feb. 2014. [Andrews+2011] J. G. Andrews, F. Baccelli, and R. K. Ganti, “A tractable approach to coverage and rate in cellular networks,” IEEE Trans. Commun., vol. 59, no. 11, Nov. 2011. [Arulkumaran+2017] K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath, “Deep reinforcement learning: A brief survey,” IEEE Signal Process. Mag., vol. 34, no. 6, Nov. 2017. [Baccelli+2009] F. Baccelli and B. Blaszczyszyn, “Stochastic geometry and wireless networks: Volume II applications,” Found. Trends Netw., vol. 4, no. 1-2, 2009. 89 / 88
  89. References II [Baddeley+2015] A. Baddeley, E. Rubak, and R. Turner,

    Spatial Point Patterns: Methodology and Applications with R. CRC Pr., 2015. [Bianchi2000] G. Bianchi, “Performance analysis of the IEEE 802.11 distributed coordination function,” IEEE J. Sel. Areas Commun., vol. 18, no. 3, Mar. 2000. [Busson+2009] A. Busson and G. Chelius, “Point processes for interference modeling in CSMA/CA ad-hoc networks,” in Proc. the 6th ACM Symp. Perform. Eval. Wireless Ad Hoc, Sensor, Ubiquitous Netw. (PE-WASUN), Canary Islands, Spain, Oct. 2009. [ElSawy+2012] H. ElSawy, E. Hossain, and S. Camorlinga, “Characterizing random CSMA wireless networks: A stochastic geometry approach,” in Proc. IEEE Int. Conf. Commun. (ICC), Ottawa, ON, Jun. 2012. 90 / 88
  90. References III [ElSawy+2013] H. ElSawy and E. Hossain, “A modified

    hard core point process for analysis of random CSMA wireless networks in general fading environments,” IEEE Trans. Commun., vol. 61, no. 4, Apr. 2013. [Fuemmeler+2006] J. A. Fuemmeler, N. H. Vaidya, and V. V. Veeravalli, “Selecting transmit powers and carrier sense thresholds in csma protocols for wireless ad hoc networks,” in Proceedings of the 2nd annual international workshop on Wireless internet - WICON ’06, New York, New York, USA, Aug. 2006. [Garetto+2008] M. Garetto, T. Salonidis, and E. W. Knightly, “Modeling per-flow throughput and capturing starvation in CSMA multi-hop wireless networks,” IEEE/ACM Trans. Netw., vol. 16, no. 4, Aug. 2008. [Haenggi2012] M. Haenggi, Stochastic Geometry for Wireless Networks. Cambridge, U.K.: Cambridge Univ. Press, 2012. 91 / 88
  91. References IV [Iwata+2019] M. Iwata, K. Yamamoto, B. Yin, T.

    Nishio, M. Morikura, and H. Abeysekera, “Analysis of inversely proportional carrier sense threshold and transmission power setting based on received power for IEEE 802.11ax,” in Proc. IEEE Consum. Commun. Netw. Conf. (CCNC), Las Vegas, NV, USA, Jan. 2019. [Kamiya+2018] S. Kamiya, K. Yamamoto, S.-L. Kim, T. Nishio, and M. Morikura, “Asymptotic analysis of normalized SNR-based scheduling in uplink cellular networks with truncated channel inversion power control,” in Proc. IEEE Int. Conf. Commun. (ICC), Kansas, MO, USA, May 2018. [Kaynia+2011] M. Kaynia, N. Jindal, and G. E. Oien, “Improving the performance of wireless ad hoc networks through MAC layer design,” IEEE Trans. Wireless Commun., vol. 10, no. 1, Jan. 2011. [Khorov+2019] E. Khorov, A. Kiryanov, A. Lyakhov, and G. Bianchi, “A tutorial on IEEE 802.11ax high efficiency WLANs,” IEEE Commun. Surveys Tuts., vol. 21, no. 1, 2019. 92 / 88
  92. References V [Koda+2018] Y. Koda, K. Yamamoto, T. Nishio, and

    M. Morikura, “Reinforcement learning based predictive handover for pedestrian-aware mmWave networks,” in Proc. IEEE Int. Conf. Comput. Commun. Workshops (INFOCOM Workshops), Honolulu, HI, USA, Apr. 2018. [Koda+2019] Y. Koda, K. Nakashima, K. Yamamoto, T. Nishio, and M. Morikura, “End-to-end learning of proactive handover policy for camera-assisted mmWave networks using deep reinforcement learning,” arXiv preprint arXiv:1904.04585, Apr. 2019. [Liew+2010] S. C. Liew, C. H. Kai, H. C. Leung, and P. Wong, “Back-of-the-envelope computation of throughput distributions in CSMA wireless networks,” IEEE Trans. Mobile Comput., vol. 9, no. 9, Sep. 2010. [Mhatre+2007] V. P. Mhatre, K. Papagiannaki, and F. Baccelli, “Interference mitigation through power control in high density 802.11 WLANs,” in Proc. IEEE Int. Conf. Comput. Commun. (INFOCOM), Anchorage, AK, May 2007. 93 / 88
  93. References VI [Mnih+2015] V. Mnih, K. Kavukcuoglu, D. Silver, A.

    A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, Feb. 2015. [Nakashima+2019] K. Nakashima, S. Kamiya, K. Ohtsu, K. Yamamoto, T. Nishio, and Masahiro Morikura, “Deep reinforcement learning-based channel allocation for wireless LANs with graph convolutional networks,” arXiv preprint arXiv:1905.07144, May 2019. [Nguyen+2007] H. Q. Nguyen, F. Baccelli, and D. Kofman, “A stochastic geometry analysis of dense IEEE 802.11 networks,” in Proc. IEEE Int. Conf. Comput. Commun. (INFOCOM), Anchorage, AK, May 2007. 94 / 88
  94. References VII [Ohto+2017] T. Ohto, K. Yamamoto, S.-L. Kim, T.

    Nishio, and M. Morikura, “Stochastic geometry analysis of normalized SNR-based scheduling in downlink cellular networks,” IEEE Wireless Commun. Lett., vol. 6, no. 4, Aug. 2017. [Schaul+2015] T. Schaul, J. Quan, I. Antonoglou, and D. Silver, “Prioritized experience replay,” in ICLR, Sna Juan, Puerto Rico, May 2015. [Sutton+2018] R. S. Sutton and A. G. Barto, Reinforcement Learning, 2nd ed. MIT Pr., 2018. [Szepesv´ ari2010] C. Szepesv´ ari, Algorithms for Reinforcement Learning. Morgan and Claypool Pub., 2010. [Venkataraman+2006] J. Venkataraman, M. Haenggi, and O. Collins, “Shot noise models for outage and throughput analyses in wireless ad hoc networks,” in Proc. IEEE Military Commun. Conf. (MILCOM), Oct. 2006. 95 / 88
  95. References VIII [vHasselt+2016] H. van Hasselt, A. Guez, and D.

    Silver, “Deep reinforcement learning with double q-learning,” in 30th AAAI Conf. Artificial Intelligence, Mar. 2016. [Wang+2015] Z. Wang, T. Schaul, M. Hessel, H. van Hasselt, M. Lanctot, and N. de Freitas, “Dueling network architectures for deep reinforcement learning,” arXiv preprint arXiv:1511.06581, Nov. 2015. [Watkins+1992] C. J. C. H. Watkins and P. Dayan, “Q-learning,” Mach. Learn., vol. 8, no. 3-4, May 1992. [Yamamoto2015] K. Yamamoto, “A comprehensive survey of potential game approaches to wireless networks,” IEICE Trans. Commun., vol. E98-B, no. 9, Sep. 2015. [Yamamoto2018] ——,“SIR distribution and scheduling gain of normalized SNR scheduling in Poisson networks,” in Proc. Int. Symp. Model. Optim. Mobile Ad Hoc Wireless Netw. (WiOpt), Shanghai, China, May 2018. 96 / 88
  96. References IX [Yin+2017] B. Yin, S. Kamiya, K. Yamamoto, T.

    Nishio, M. Morikura, and H. Abeysekera, “Mitigating throughput starvation in dense WLANs through potential game-based channel selection,” IEICE Trans. Fundamentals, vol. E100-A, no. 11, Nov. 2017. Some figures in this slide are licensed to speakers only. 97 / 88