Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Generative Emergent Communication: Is a Large L...

Generative Emergent Communication: Is a Large Language Model a Collective World Model?

Tadahiro Taniguchi (Kyoto Univ. / RitsumeikanUniv.), Ryo Ueda (Univ. of Tokyo), Tomoaki Nakamura (UEC),
Masahiro Suzuki (Univ. of Tokyo), Akira Taniguchi (RitsumeikanUniv.)
The 2025 Annual Conference of the Japanese Society for Artificial Intelligence (JSAI)
27th May 2025 (English translation)

[Reference]
Taniguchi, T., Ueda, R., Nakamura, T., Suzuki, M., & Taniguchi, A. (2024). Generative Emergent Communication: Large Language Model is a Collective World Model. arXiv preprint arXiv:2501.00226.

https://arxiv.org/abs/2501.00226

Avatar for Tadahiro Taniguchi

Tadahiro Taniguchi

May 27, 2025
Tweet

More Decks by Tadahiro Taniguchi

Transcript

  1. Generative Emergent Communication: Is a Large Language Model a Collective

    World Model? Tadahiro Taniguchi (Kyoto Univ. / Ritsumeikan Univ.), Ryo Ueda (Univ. of Tokyo), Tomoaki Nakamura (UEC), Masahiro Suzuki (Univ. of Tokyo), Akira Taniguchi (Ritsumeikan Univ.) The 2025 Annual Conference of the Japanese Society for Artificial Intelligence (JSAI) 27th May 2025 (English translation) 1
  2. Generative Emergent Communication: LLM is a Collective World Model [Taniguchi+

    2024 (arXiv)] Taniguchi, T., Ueda, R., Nakamura, T., Suzuki, M., & Taniguchi, A. (2024). Generative Emergent Communication: Large Language Model is a Collective World Model. arXiv preprint arXiv:2501.00226.
  3. Does an LLM Have a “World Model”? Gurnee, W., &

    Tegmark, M. (2023). Language models represent space and time. arXiv preprint arXiv:2310.02207. Yoshida, T., Masumori, A., & Ikegami, T. (2023). From Text to Motion: Grounding GPT-4 in a Humanoid Robot" Alter3". arXiv preprint arXiv:2312.06571. Hao, S., Gu, Y., Ma, H., Hong, J. J., Wang, Z., Wang, D. Z., & Hu, Z. (2023). Reasoning with language model is planning with world model. arXiv preprint arXiv:2305.14992. (EMNLP 2023) Osada, M., Garcia Ricardez, G. A., Suzuki, Y., & Taniguchi, T. (2024). Reflectance estimation for proximity sensing by vision-language models: Utilizing distributional semantics for low-level cognition in robotics. Advanced Robotics, 38(18), 1287-1306.
  4. Two Types of “World Models” ① World(/Umwelt) Model: An internal

    model that dynamically captures perception-action relations from the agent’s subjective view. ② Model of the World: A model possessing objective world knowledge. Taniguchi, T., Murata, S., Suzuki, M., Ognibene, D., Lanillos, P., Ugur, E., Jamone, L., Nakamura, T., Ciria, A., Lara, B., & Pezzulo, G. (2023). World models and predictive coding for cognitive and developmental robotics: frontiers and challenges. Advanced Robotics, 37(13), 780-806. The statement “LLM has a world model!” often refers to type (2), but may be computationally/logically linked to type (1) as well!!
  5. Symbol emergence systems [Taniguchi+ 2016] Tadahiro Taniguchi, Takayuki Nagai, Tomoaki

    Nakamura, Naoto Iwahashi, Tetsuya Ogata, and Hideki Asoh, Symbol Emergence in Robotics: A Survey, Advanced Robotics, 30(11-12) pp.706-728, 2016. DOI:10.1080/01691864.2016.1164622 Individual-level: Representation learning Group-level: Symbol emergence 5
  6. Generative EmCom: e.g., Metropolis-Hastings Naming Game 1. Perception: Speaker and

    listener observe an object and infer internal representations. 2. Communication: Speaker samples a name probabilistically; listener probabilistically accepts/rejects it based on belief. 3. Learning: Listener updates internal parameters. 4. Turn-taking: Roles switch. Observation o u Semiotic Communication Representation learning Object Agent A Agent B Internal representations Speaker utters a sign as sampling Listener judges if it accepts the sign Observation Sign  Yoshinobu Hagiwara , Hiroyoshi Kobayashi, Akira Taniguchi and Tadahiro Taniguchi, Symbol Emergence as an Interpersonal Multimodal Categorization, Frontiers in Robotics and AI, 6(134), pp.1-17, 2019  Yoshinobu Hagiwara, Kazuma Furukawa, Akira Taniguchi & Tadahiro Taniguchi, Multiagent multimodal categorization for symbol emergence: emergent communication via interpersonal cross-modal inference, Advanced Robotics, 2022. 6 The degree to which the partner’s proposed name aligns with the name expected based on one’s own belief.
  7. Agent A Agent B Agent A Agent B Decomposition Composition

    Probabilistic Graphical Model of Symbol Emergence System Taniguchi, T., Yoshida, Y., Matsui, Y., Le Hoang, N., Taniguchi, A., & Hagiwara, Y. (2023). Emergent communication through Metropolis- Hastings naming game with deep generative models. Advanced Robotics, 37(19), 1266-1282. 7 MH naming game as distributed MCMC Bayesian inference By generating language, we can integrate recognition just like connecting brains
  8. Why Do LLMs Appear to “Understand” the World? Collective Predictive

    Coding Hypothesis [Taniguchi ‘24]  The emergence of language can be situated as a continuation of the theoretical framework of the free energy principle and world models.  Since language is generated through collective predictive coding, information about the external world becomes embedded within its distributional semantics.  Hence, large language models may appear to comprehend the world as if they possessed embodied experience. 8 Overview fo Collective predictive coding [Taniguchi ‘24]
  9. Predictive Coding (Learning World Models) 9 Action Perception Internal representations

    (World models) Environment (World) 𝑃𝑃(𝑍𝑍𝑖𝑖 |𝑋𝑋𝑖𝑖 ) Observation (Multimodal Sensorimotor Information) Internal Representation (World Model)  Individuals learn internal world models through multimodal perception and interaction with the environment. 𝒊𝒊 ∈ 𝑰𝑰: Agents (humans)
  10. A Collective Performing Predictive Coding 10 Action Perception Internal representations

    (World models) Environment (World) 𝑃𝑃 𝑍𝑍𝑖𝑖 𝑋𝑋𝑖𝑖 𝑖𝑖∈𝐼𝐼 𝒊𝒊 ∈ 𝑰𝑰: Agents  Each agent (human) performs predictive coding
  11. Collective Predictive Coding [Taniguchi ‘24] Action Perception Internal representations (World

    models) Environment (World) Utterance Interpretation Constraint Organization Language (Emergent symbol system) Collective predictive coding Symbol emergence 𝑃𝑃 𝑊𝑊, 𝑍𝑍𝑖𝑖 𝑖𝑖∈𝐼𝐼 {𝑋𝑋𝑖𝑖 }𝑖𝑖∈𝐼𝐼 External representations (language/symbols)  Symbol emergence formulated as distributed Bayesian inference among multiple agents  Experimental evidence supports similar human behavior [Okumura+ 2023] 𝒊𝒊 ∈ 𝑰𝑰: エージェント(人間) 11
  12. 12 https://x.com/hayashiyus/status/1831309992638759210 Taniguchi, T., Takagi, S., Otsuka, J., Hayashi, Y.,

    & Hamada, H. T. (2024). Collective Predictive Coding as Model of Science: Formalizing Scientific Activities Towards Generative Science. arXiv preprint arXiv:2409.00102. Hayashi, Y. AI Alignment network Formulation of CPC based on the free energy principle [Taniguchi+ 2024] 1. Ordinary variational free energy × Number of agents (Representation learning, predictive coding, world model learning) 2. Collective regularization term (Alignment of external representation w conditioned by internal representation z and, symbol emergence)
  13. Bayesian Integration of Multiple VLMs via Captioning Game [Matsui+ 2025]

    Matsui, Y., Yamaki, R., Ueda, R., Shinagawa, S., & Taniguchi, T. (2025). Metropolis-Hastings Captioning Game: Knowledge Fusion of Vision Language Models via Decentralized Bayesian Inference. arXiv preprint arXiv:2504.09620. Intractable Posterior Caption: VLM A VLM B a dog holds his head out of a car window. Observation a dog holds his head out of a car window. Learning Inference Observation Repeat VLM A (Speaker) A dog leans out of the vehicle VLM B (Listener) A black car moves along the street 2. Proposal Previous Caption A black car moves along the street Acceptance probability Accept or Reject 1. Perception Observation A 3. Judgement Updated Caption Observation B A black car moves along the street 4. Learning VLM B (Listener)  A language model (GPT-2) was introduced for the utterance component.  The Metropolis-Hastings Naming Game (MHNG) based on Vision-Language Models (VLMs) was extended into a caption generation game.  The framework demonstrates how agents with different prior knowledge (e.g., pretrained on COCO and CC3M) align their representations through communication.  It enables mutual learning—agents learning from each other—while mitigating catastrophic forgetting.
  14. VLM EmCom Image captioning & generation VLM EmCom Video captioning

    & generation VLA EmCom Action-dependent video captioning & video and action prediction ≒World models
  15. LLM Modeling the Distribution of Language as a “Collective World

    Model” Learning representations/world models toward collectively intelligent language expression Symbol emergence = Collective predictive coding 15 Collective world model LLM
  16. Conclusion  LLMs ≈ Collective World Models  External expressions

    integrating human experiences via language  Mediate learning/representation of internal (world) models  CPC Perspective  Language = result of distributed Bayesian inference in human society  Enables representation learning & generation  Connecting Symbol Emergence and World Models  Multia-gent symbol emergence via generative communication and world models  Why LLMs Seem to “Understand” the world?  Distributional semantics encode world knowledge  Consistency with free energy principle  Future Work  Robotic/vision/action-based empirical validation  Evaluation against human language & generalizability 𝑷𝑷( ) Better than