Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[ICML'25] Policy Design for Two-sided Platforms...

[ICML'25] Policy Design for Two-sided Platforms with Participation Dynamics

Avatar for Haruka Kiyohara

Haruka Kiyohara

May 02, 2025
Tweet

More Decks by Haruka Kiyohara

Other Decks in Research

Transcript

  1. Policy Design for Two-sided Platforms with Participation Dynamics Haruka Kiyohara1,

    Fan Yao2, Sarah Dean1 July 2025 Participation Dynamics in Two-sided Platforms @ ICML 1 1 2
  2. Two-sided platforms are ubiquitous Both viewers and providers are essential

    for the success of two-sided platforms. July 2025 Participation Dynamics in Two-sided Platforms @ ICML 2 … … viewer provider
  3. Two-sided platforms are ubiquitous Both viewers and providers are essential

    for the success of two-sided platforms. July 2025 Participation Dynamics in Two-sided Platforms @ ICML 3 … … viewer provider Viewers receive content recommendation. If viewers are happy/dissatisfied with contents, they may increase/decrease participation. Providers supply contents to the platform. If providers recieve high/inadequate exposure, they may increase/decrease production.
  4. Two-sided platforms are ubiquitous Both viewers and providers are essential

    for the success of two-sided platforms. July 2025 Participation Dynamics in Two-sided Platforms @ ICML 4 … … viewer provider How should we design content allocation to pursue the long-term success? Applications include video streaming, online ads, job matching, SNS, and more!
  5. Topics this slides cover are.. • Modeling two-sided platforms and

    population effects • Is the myopic-greedy policy sufficient? - showcasing the failure mode • Game- and control-theoretic insights about the deficiency of myopic-greedy • Our proposal: Look-ahead policy for optimizing the long-term objectives • Synthetic and real-data experiments July 2025 Participation Dynamics in Two-sided Platforms @ ICML 5
  6. “Viewers” and “Providers” in two-sided platforms Providers • 𝐿 different

    subgroups ( ). (by genre, e.g., sports or econ category) • Population: July 2025 Participation Dynamics in Two-sided Platforms @ ICML 6 Viewers • 𝐾 different subgroups ( ). (by age, gender, nationality, etc) • Population: Platform • Joint population:
  7. “Viewers” and “Providers” in two-sided platforms Providers • 𝐿 different

    subgroups ( ). (by genre, e.g., sports or econ category) • Population: • Exposure: July 2025 Participation Dynamics in Two-sided Platforms @ ICML 7 Viewers • 𝐾 different subgroups ( ). (by age, gender, nationality, etc) • Population: • Satisfaction: Platform • Joint population: • Allocation policy: (where is the prob. of allocating provider-group 𝑙 to viewer-group 𝑘) (where is the utility from provider 𝑙)
  8. Key assumptions about viewers and providers (1/2) July 2025 Participation

    Dynamics in Two-sided Platforms @ ICML 8 Assumption 1 (Existence of “reference” population) Viewer and provider subgroups have their own “reference” population given the level of satisfaction and exposure, i.e., and . We assume ҧ 𝜆 is a monotonistic increasing function. satisfaction / exposure “ref.” population
  9. Key assumptions about viewers and providers (1/2) July 2025 Participation

    Dynamics in Two-sided Platforms @ ICML 9 Assumption 1 (Existence of “reference” population) Viewer and provider subgroups have their own “reference” population given the level of satisfaction and exposure, i.e., and . We assume ҧ 𝜆 is a monotonistic increasing function. satisfaction / exposure “ref.” population This model is also supported by a game-theoretic formulation of utility-seeking (selfish) agents, e.g., the utility of participant 𝑖 in group 𝑘 is, , where 𝑐 is the cost of participation. Then we have .
  10. Key assumptions about viewers and providers (2/2) July 2025 Participation

    Dynamics in Two-sided Platforms @ ICML 10 Assumption 2 (“Population effects” on product quality) Viewers enjoy improved quality of products as provider population of the target group increases, i.e., , where 𝑏 is base value and 𝑓 is a monotonistic increasing function. provider population quality
  11. Key assumptions about viewers and providers (2/2) July 2025 Participation

    Dynamics in Two-sided Platforms @ ICML 11 Assumption 2 (“Population effects” on product quality) Viewers enjoy improved quality of products as provider population of the target group increases, i.e., , where 𝑏 is base value and 𝑓 is a monotonistic increasing function. provider population quality Example: two-stage recommender system Consider our high-level policy 𝜋 decides group-wise alloc. and a lower-level policy chooses ind. product within group. For a reasonable lower-level policy, we can except increasing population (# of products) help identify high-quality products among each provider group. job matching, video streaming, etc..
  12. Participation dynamics and “population effects” Consider the following dynamics in

    the system (𝑡 denotes a timestep): • Reactive participation dynamics – viewers and providers gradually move toward the reference. • Population effects in the payoff functions – population matters in the payoff functions. July 2025 Participation Dynamics in Two-sided Platforms @ ICML 12 𝜂 ∈ [0, 1] is the reactiveness hyperparam.
  13. Participation dynamics and “population effects” Consider the following dynamics in

    the system (𝑡 denotes a timestep): • Reactive participation dynamics – viewers and providers gradually move toward the reference. • Population effects in the payoff functions – population matters in the payoff functions. July 2025 Participation Dynamics in Two-sided Platforms @ ICML 13 𝜂 ∈ [0, 1] is the reactiveness hyperparam. viewer provider viewer provider viewer provider … (𝑡 − 1) (𝑡) (𝑡 + 1)
  14. Challenges in policy design under population effects July 2025 Participation

    Dynamics in Two-sided Platforms @ ICML 14 Theorem 1 (Sub-optimality of the “myopic-greedy” policy) The myopic-greedy policy, which reactively chooses the best action w.r.t. immediate outcome ( ) is guaranteed optimal only when the population effect (𝑓) is linear and homogeneous across multiple product groups.
  15. Challenges in policy design under population effects July 2025 Participation

    Dynamics in Two-sided Platforms @ ICML 15 Theorem 1 (Sub-optimality of the “myopic-greedy” policy) The myopic-greedy policy, which reactively chooses the best action w.r.t. immediate outcome ( ) is guaranteed optimal only when the population effect (𝑓) is linear and homogeneous across multiple product groups. w/ heterogeneity (cross-over) and satuation of 𝑓, myopic-greedy can be worse than even uniform-random.
  16. Challenges in policy design under population effects July 2025 Participation

    Dynamics in Two-sided Platforms @ ICML 16 Theorem 1 (Sub-optimality of the “myopic-greedy” policy) The myopic-greedy policy, which reactively chooses the best action w.r.t. immediate outcome ( ) is guaranteed optimal only when the population effect (𝑓) is linear and homogeneous across multiple product groups. w/ heterogeneity (cross-over) and satuation of 𝑓, myopic-greedy can be worse than even uniform-random.
  17. Investigating the theory behind the observation We now study the

    backgroud reason in 2 type of analyses: • Equilibrium analysis • Regret analysis July 2025 Participation Dynamics in Two-sided Platforms @ ICML 17 w/ heterogeneity (cross-over) and satuation of 𝑓, myopic-greedy can be worse than even uniform-random.
  18. Equilibrium analysis (1/2) July 2025 Participation Dynamics in Two-sided Platforms

    @ ICML 18 Theorem 2 (Existence of “Nash equlibrium (NE)”) The two-sided dynamics has a corresponding game instance, and NE of the game always exists (but not necessarily unique). The dynamics converges to one of NE when the reactiveness hyper- param (𝜂) is adequately small for the given game instance.
  19. More specifically, the corresponding game instance (inner utility model) is

    expressed as: Gradient ascent (GA) of the above utility model results in the population dynamics of viewers and providers. An “adequately small” value of 𝜂 avoids the divergence of GA. Equilibrium analysis (1/2) July 2025 Participation Dynamics in Two-sided Platforms @ ICML 19 Theorem 2 (Existence of “Nash equlibrium (NE)”) The two-sided dynamics has a corresponding game instance, and NE of the game always exists (but not necessarily unique). The dynamics converges to one of NE when the reactiveness hyper- param (𝜂) is adequately small for the given game instance.
  20. Equilibrium analysis (2/2) July 2025 Participation Dynamics in Two-sided Platforms

    @ ICML 20 Theorem 3 (Sufficient condition for a stable equilibrium, i.e., NE*) Suppose the first-order derivative of dynamics are bounded as , , at some fixed point . Also suppose . Then, is stable when * Every stable fixed point corresponds to NE.
  21. Equilibrium analysis (2/2) July 2025 Participation Dynamics in Two-sided Platforms

    @ ICML 21 Theorem 3 (Sufficient condition for a stable equilibrium, i.e., NE*) Suppose the first-order derivative of dynamics are bounded as , , at some fixed point . Also suppose . Then, is stable when provider population quality The RHS of ineq. becomes more restrictive when the population is small when 𝑓, ҧ 𝜆 follows concave function (i.e., diminishing increase). Exposure-concentrated policy (e.g., greedy) allows loss of viewer/provider populations. myopic uniform
  22. To discuss the sub-optimality of an arbitrary policy, we introduce

    the following regret: where • : (long-term) optimal policy • : population at the timestep 𝑡 under the policy ( is similarly for ) • : total timestep Regret analysis July 2025 Participation Dynamics in Two-sided Platforms @ ICML 22
  23. Regret analysis xxx July 2025 Participation Dynamics in Two-sided Platforms

    @ ICML 23 Theorem 4 (Decomposition of the regret) The total regret is decomposed into two terms: where dynamics regret is caused by the population diff. policy regret is caused by the immediate policy diff. : one-step optimal policy given 𝜆𝑡
  24. Regret analysis xxx July 2025 Participation Dynamics in Two-sided Platforms

    @ ICML 24 Theorem 4 (Decomposition of the regret) The total regret is decomposed into two terms: where dynamics regret is caused by the population diff. policy regret is caused by the immediate policy diff. The myopic-greedy policy only minimizes the policy regret. Thus, when the dynamics regret becomes large, myopic-greedy falls short.
  25. A simple algorithm for minimizing the dynamics regret To minimize

    the dynamics regret, we proposed the following Look-ahead policy: , which optimizes the policy to maximize the expected reward at the reference point. July 2025 Participation Dynamics in Two-sided Platforms @ ICML 25
  26. A simple algorithm for minimizing the dynamics regret To minimize

    the dynamics regret, we proposed the following Look-ahead policy: , which optimizes the policy to maximize the expected reward at the reference point. To make the objective function differential, we use the softmax 1-step optimal policy: July 2025 Participation Dynamics in Two-sided Platforms @ ICML 26
  27. A simple algorithm for minimizing the dynamics regret Then, we

    can optimize the policy at the timestep 𝑡 via gradient ascent: where July 2025 Participation Dynamics in Two-sided Platforms @ ICML 27
  28. Synthetic experiment • We compare uniform-random and interpolation of look-ahead

    and myopic-greedy: July 2025 Participation Dynamics in Two-sided Platforms @ ICML 28
  29. Synthetic experiment • We compare uniform-random and interpolation of look-ahead

    and myopic-greedy: • The dynamics follows concave function (upper half of the sigmoid): • (We assume we know the dynamics fn. and there is no noise.) July 2025 Participation Dynamics in Two-sided Platforms @ ICML 29 (quality can be multi-dimensional)
  30. Synthetic experiment With small and large initial population, look-ahread policy

    increases both total welfare and provider population, while myopic-greedy falls short. July 2025 Participation Dynamics in Two-sided Platforms @ ICML 30
  31. Synthetic experiment With small and large initial population, look-ahread policy

    increases both total welfare and provider population, while myopic-greedy falls short. July 2025 Participation Dynamics in Two-sided Platforms @ ICML 31 Taking a close look at the viewer-provider rewards, we observe that balanced allocation of exposure by look-ahead policy help improve the population effect.
  32. Real-data experiment • We use the KuaiRec dataset (Gao et

    al., 2022) for semi-synthetic simulation. • We cluster viewers and providers based on embeddings of collaborative filtering. • We define the population effect as follows. July 2025 Participation Dynamics in Two-sided Platforms @ ICML 32
  33. Real-data experiment • In this experiement, we also estimate the

    dynamics function using interaction data. • We show the estimation result at the final timestep as follows. July 2025 Participation Dynamics in Two-sided Platforms @ ICML 33
  34. Real-data experiment • In this experiment, look ahead and myopic-greedy

    achieve a high total welfare. • Look-ahead polict achives a high provider population while maximizing the utility. • Together with the synthetic result, look-ahead can adaptively maximize the utility. July 2025 Participation Dynamics in Two-sided Platforms @ ICML 34
  35. Key takeaways • We discuss the policy design in two-sided

    platforms with population dynamics. • In such cases, myopic-greedy may fall short especially when the quality increase of products are heterogeneous and concave (i.e., have some satuation). • We also found that maintaining both high viewer satisfaction and provider exposure is crucial, and proposed a simple look-ahead algorithm for long-term objectives. • Experiments show that the look-ahead policy achieves a high total welfare adaptively to the situation, by maintaining both high user satisfaction and provider exposure. July 2025 Participation Dynamics in Two-sided Platforms @ ICML 35