Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Virtual Vehicle Fleet in your Lab - A Use Case ...

Avatar for thatsdone thatsdone
February 27, 2025

Virtual Vehicle Fleet in your Lab - A Use Case of AGL as a High-Level R&D Testbed Component

I gave this talk at the AGL AMM Spring 2025.
The primary distribution site of this deck is at:
https://sched.co/1vePR

Avatar for thatsdone

thatsdone

February 27, 2025
Tweet

More Decks by thatsdone

Other Decks in Technology

Transcript

  1. Masanori Itoh, Toyota Virtual Vehicle Fleet in your Lab –

    A Use Case of AGL as a High-Level R&D Testbed Component
  2. About Me Masanori Itoh  Affiliation  InfoTech-IS, Information and

    Communication Planning Div. Open Source Program Group (Toyota OSPO) TOYOTA MOTOR CORPORATION  Works  R&D for Connected Vehicle Infrastructure  E2E Observability, Standardization,…  Toyota OSPO - Co-Lead  Keywords  Operating System, Cloud Infrastructure, etc.  https://github.com/thatsdone  https://www.linkedin.com/in/masanori-itoh-6401603/ 2
  3. Copyright © 2025 TOYOTA MOTOR CORPORATION All rights reserved. Agenda

     Background – A High-Level R&D POC  (Low-Level) Motivation and Ideas  System Design  Evaluation  Summary – Takeaways  Possible Future Works  Appendix : TIPs 3
  4. Copyright © 2025 TOYOTA MOTOR CORPORATION All rights reserved. Background

    – A High Level R&D POC  High-Level Motivation  To Prove of Concept of “Edge Computing” -- Use Cases, Benefits and Feasibility  Playgrounds  Public & Standardization Organization : AECC (Automotive Edge Computing Consortium)  https://aecc.org/  Collaborative Works with Partners  “Initiatives towards a Connected Mobility Society” (KDDI Corp. press release) – https://newsroom.kddi.com/english/news/detail/kddi_pr-1140.html  “E2E Observability for Connected Vehicle Service via Distributed Tracing” (KubeCon NA ‘23) – https://sched.co/1R2oh  “E2E Observability for Connected Vehicle Services Including 5G Cellular Network U-Plane Troubles” (OSS Japan ‘23) – https://sched.co/1Tyrm  View Points  Functional/Performance/Operational… Feasibility  E2E Observability  My Focus 4
  5. Copyright © 2025 TOYOTA MOTOR CORPORATION All rights reserved. Background

    – A High Level R&D POC A POC System of Connected Vehicles Service Data Processing gNB#1 W A N Location #1 (on-premise) Location #2 (on-premise) Public Cloud #1 CAN Camera DynamicMap Others… LB Ops Subsystem Pseudo Vehicle Pseudo Vehicle (generator) Edge#1(Region #2) Auth GW Dispatcher Offload Process Slice#1 (Lat.) Auth GW For Fallback W A N Data Accumulation Edge#2(Region #2) Auth GW Dispatcher Offload Process UE#1 Slice#2 (B/W) UPF#2 UPF#1 Dedicated Line Pseudo Vehicle (generator) UE#1 Slice#1 (Lat.) Slice#2 (B/W) OCI /etc. Public Cloud#2 UPF#2 UPF#1 Fail Over gNB#2 gNB#2 gNB#1 Orchestra tion gNB#1 Dedicated Line/VPN Dedicated Line/VPN Dedicated Line 5
  6. Copyright © 2025 TOYOTA MOTOR CORPORATION All rights reserved. Background

    – A High Level R&D POC A POC System of Connected Vehicles Service Data Processing 6 gNB#1 W A N Location #1 (on-premise) Location #2 (on-premise) Public Cloud #1 CAN Camera DynamicMap Others… LB Ops Subsystem Peeudo Vehiclea Pseudo Vehicle (generator) Edge#1(Region #2) Auth GW Dispatcher Offload Process Slice#1 (Lat.) Auth GW For Fallback W A N Data Accumulation Edge#2(Region #2) Auth GW Dispatcher Offload Process UE#1 Slice#2 (B/W) UPF#2 UPF#1 Dedicated Line Pseudo Vehicle (generator) UE#1 Slice#1 (Lat.) Slice#2 (B/W) OCI /etc. Public Cloud#2 UPF#2 UPF#1 Fail Over gNB#2 gNB#2 gNB#1 Orchestra tion gNB#1 Dedicated Line/VPN Dedicated Line/VPN Dedicated Line Boxes with red dashed line are Subsystems of Interest now
  7. Copyright © 2025 TOYOTA MOTOR CORPORATION All rights reserved. (Low-Level)

    Motivation and the Idea  (a bit Low-Level) Motivation  To have Fleets of Virtual Connected Vehicles in my Lab  Idea  Running AGL/aarch64 instances using Linux/KVM via libvirt on ARM servers  Libvirt is commonly/widely used in Virtualization/Private Cloud Infrastructure – Rich know-hows in the internet 7
  8. Copyright © 2025 TOYOTA MOTOR CORPORATION All rights reserved. System

    Design  Design Points  Use Libvirt w/KVM acceleration on ARM server/ (NO Emulation!)  Instead of using ‘runqemu’  Libvirt enables also Centralized Control/Management  Use ‘backing_file’ feature to reduce disk space consumption  Additional Software Components to AGL  Install UERANSIM and make AGL instances 5G UE Equipped  https://github.com/aligungr/UERANSIM  UERANSIM communicates with OSS based 5GC (built using Free5GC)  Add Observability related components  Tracing : OpenTelemetry (otelcol-contrib)  Metrics: Prometheus Node Exporter 8
  9. Copyright © 2025 TOYOTA MOTOR CORPORATION All rights reserved. Evaluation

    – View Points  (1) Single-Node Performance Evaluation  (2) Multi-Nodes Performance Evaluation  (3) Functionality Evaluation  w/Emulated cellular connection via UERANSIM + Free5GC  Distributed Tracing via OpenTelemetry 9
  10. Copyright © 2025 TOYOTA MOTOR CORPORATION All rights reserved. Evaluation

    – Environments (HW/SW(Host/Guest))  Host Hardware  Supermicro ARS-110M-NR  Ampere Altra Max, 1P128C, 3.0GHz, memory 128GB  https://www.supermicro.com/ja/products/system/megadc/1u/ars-110m-nr  Ubuntu 22.04.3(aarch64)  qemu 1:6.2+dfsg-2ubuntu6.15 (Ubuntu 22.04 bundle)  kernel 5.15.0-78-generic (aarch64)  Guest  AGL 16.0.3  https://download.automotivelinux.org/AGL/release/pike/16.0.3  kernel : kernel 5.15.124-yocto-standard (aarch64)  Kernel configuration is default (pagesize 4KB, THP disabled)  Benchmark tool  UnixBench 5.1.3
  11. Copyright © 2025 TOYOTA MOTOR CORPORATION All rights reserved. Single-Node

    Performance Evaluation Results # Host CPU Host OS Guest OS Score (1 proc.) <- Raspi Ratio Score (4 proc.) <- Raspi Ratio Note 1 RaspberryPi4 ARM(Cortex-A72) Raspbian 10 (armv7l, 32bit) - 263.1 1.0 742.3 1.0 2 qemu w/ kvm ARM(Graviton3) Ubuntu 22.04 (arm64) AGL 15.91.1 1464.1 5.56 3603.4 4.85 kvm acceleration ratio 150.93 3 qemu w/o kvm ARM(Graviton3) Ubuntu 22.04 (arm64) AGL 15.91.1 9.7 0.037 39.5 0.053 4 No virtualization ARM(Gravition3) Ubuntu 22.04 (arm64) - 1713.2 6.51 - - 5 qemu w/o kvm (Intel) Xeon Platinum 8160 2.1GHz Ubuntu 23.04 (amd64) Ubuntu 22.04 (arm64) 7.2 0.027 24.3 0.033 Emulation of ARM on Intel 6 No virtualization ARM(Altra Max) Ubuntu 22.04 (arm64) - 1477.5 5.61 4214.9 5.67 7 No virtualization ARM(Graviton2) Ubuntu 22.04 (arm64) 1103.7 4.195 8 qemu w/ kvm ARM(Altra Max) Ubuntu 22.04 (arm64) Ubuntu 22.04 (arm64) 1443.1 5.485 3856.8 5.195 9 qemu w/kvm ARM(Altra Max) Ubuntu 22.04 (arm64) AGL 16.0.3 (arm64) 1433.9 5.450 - - AGL w/KVM acceleration  UnixBench Scores
  12. Copyright © 2025 TOYOTA MOTOR CORPORATION All rights reserved. Single-Node

    Performance Evaluation Results  Notes  Scores are “System Benchmarks Index Score” of UnixBench  Geometric average of relative measured results (like Dhrystone) against base hardware (SUN SPARCstation 20 SM61)  #2~4 are results on the same physical server(AWS EC2 Gravition3 baremetal instance(c7g.metal))  OSes of #2~5 are aarch64, RasPi of #1 is 32bit(armhf)  ARM server of #8~9 is SuperMicro ARS-110M-NR (Ampere Altra Max, 128cores/3.0GHz)
  13. Copyright © 2025 TOYOTA MOTOR CORPORATION All rights reserved. Single-Node

    Performance Evaluation Results  Conclusion  We can run AGL with enough practical performance using VMs (qemu) with KVM acceleration on physical ARM CPU servers  With KVM acceleration, overhead is about 15% against baremetal servers  KVM acceleration gives 150 times better score (against emulation)
  14. Copyright © 2025 TOYOTA MOTOR CORPORATION All rights reserved. Multi-Nodes

    Performance Evaluation - Setup  Motivation  What if we run many VMs on a single machine?  Test Setup  Run one UnixBench process in each VM and up to 100VMs concurrently  CPU Affinity Configuration 1:1 to physical cores  Assigned physical cores with appropriate spacing  E.g. #11 for VM#1, 60 for VM#2 in case 2VMs measurement  First 10 cores (out of 128) are reserved for the Host OS  Run UnixBench (almost) simultaneously using GNU parallel from the host side  Note : A bit UN-natural as generic workload  Used ‘backing_file ’ feature of qcow2 virtual disk image format
  15. Copyright © 2025 TOYOTA MOTOR CORPORATION All rights reserved. Multi-Nodes

    Performance Evaluation - Setup  Consolidated AGL 100VMs on a Single Physical ARM Server SuperMicro ARS-110M-NR Ampere AlaraMax (128 cores), 128GB memory ・・・ Up to 100 Virtual Vehicles running AGL (qemuarm64 image) libvirt AGL VM #1 UnixBench (Run script) qemu/kvm Boot Image (VM#1) Base Image AGL VM #100 UnixBench (Run script) qemu/kvm backing_file Boot Image (VM#100) backing_file Ubuntu (KVM/x86_64) GNU Parallel On NVMe local storage Ubuntu 22.04.3(aarch64) kernel 5.15.0-78-generic (aarch64) qemu 1:6.2+dfsg- 2ubuntu6.15 Central Controller dnsmasq (DHCP)
  16. Copyright © 2025 TOYOTA MOTOR CORPORATION All rights reserved. Multi-Nodes

    Performance Evaluation – Results  “System Benchmark Index Score” and Breakdown to each micro-benchmark Dhrystone/Whetstone/Pipe/System Call (No perf. Impacts) Tests with I/O are generally affected Performance degradation (per VM) is roughly 50% on 100 VMs
  17. Copyright © 2025 TOYOTA MOTOR CORPORATION All rights reserved. Multi-Nodes

    Performance Evaluation – Analysis  Quick Analysis  We see single-node level performance degradation from around 10 VMs(concurrency). At the maximum concurrency (=100 VMs), single-node level UnixBench score is roughly 50% against 1 VM.  CPU intensive tests (e.g. Dhrystone) are not affected along with concurrency.  I/O intensive tests (e.g., File Copy) are noticeably affected. Majority of I/O intensive tests have linear degradation  Deep analysis required, but the root cause would be I/O and mutual exclusion contention regarding qcow2 backing_file feature.  In case using qemu/kvm for CI/CD infrastructure, maybe better to avoid qcow2 backing_file feature or use disaggregated storage(e.g. Ceph).
  18. Copyright © 2025 TOYOTA MOTOR CORPORATION All rights reserved. Functionality

    Evaluation  Effectiveness of AGL as a (High-Level) POC Testbed  E2E Connectivity through the (Emulated) 5G cellular communication by UERANSIM/Free5GC  Worked without any troubles   Build of UERANSIM is straight forward, just ‘cmake’ and ‘make’ following the UERANSIM document  E2E Observability  Showed effectiveness of Distributed Tracing (by OpenTelemetry(OTEL)) for E2E (Vehcile ~ Mobile NW ~ Backbone NW ~ Cloud) Excerpted from a talk before (OSSJ2023, https://sched.co/1R2oh)
  19. Copyright © 2025 TOYOTA MOTOR CORPORATION All rights reserved. Conclusion

    / Take Aways  AGL can be used as an testbed for high level R&D POC, not only for IVI/IC  In this sense, better to create a recipe with a reduced feature set  e.g. In case of me, graphic is not necessary.  Libvirt is another way to run AGL on virtualized environments  Good for centralized management/control  qemuarm64 with KVM acceleration gives us enough performance  But be careful when using qcow2 backing_file feature for I/O workloads
  20. Copyright © 2025 TOYOTA MOTOR CORPORATION All rights reserved. Future

    Works  Random Thoughts  Update host/guest side software  Build custom AGL image based on appropriate profile (gateway, IMHO)  Add some components (otelcol-contrib, UERNSIM, ebpf related tools, etc.)  Linux kernel (new) features (including host/qemu side)  Real Time Kernel  Extensible Scheduler Class  Functional Safety – Working with ELISA?  Standardization – e.g. SOVD/OBD  Edge Computing – Working with AECC (https://aecc.org/)  Other  SBOM improvement  Etc.
  21. Copyright © 2025 TOYOTA MOTOR CORPORATION All rights reserved. TIP#1

    – Define and Run VMs using libvirt  Yet another method to run qemu/kvm other than ‘runqemu’ of Yocto  Commonly used in Virtualization/Private Cloud Infrastructure  Convenient for centralized management virt-install ¥ --name agl-vm1 ¥ --arch aarch64 ¥ --machine virt ¥ --cpu max ¥  Ask Full CPU Feature --vcpus 1 ¥ --memory 1024 ¥  1GB --import ¥ --disk /opt/16.0.3-qemuarm64/agl-demo-platform-crosssdk-qemuarm64-20231219162613.rootfs.ext4,format=raw,bus=virtio ¥ --network bridge=br1,model=virtio ¥ --video vga ¥ --graphics vnc ¥ --serial pty ¥ --rng device=/dev/urandom,model=virtio ¥ --boot 'kernel=/opt/16.0.3-qemuarm64/Image--5.15.124+git0+f484a7f175_f0e7af/d594-r0.26-qemuarm64- 20231219162613.bin,kernel_args=root=/dev/vda panic=1 rootfstype=ext4 rw console=ttyAMA0' ¥ --noacpi ¥  This was necessary --events on_reboot=restart ¥ --osinfo detect=on,require=off ¥ --noautoconsole Flat image case. In case qcow2 format, specify ‘qcow2’ instead of ‘raw’
  22. Copyright © 2025 TOYOTA MOTOR CORPORATION All rights reserved. TIP#2

    – Reduction of virtual disk size  When running massive number of VMs using the same AGL image, there is a room to reduce total size of virtual disks because most of the contents are the same. Otherwise, sometimes you get disk-full trouble (like me).  Use ‘backing_file’ feature of qcow2  Copy-on-Write type size reduction  Pros : Can reduce (physical) disk usage dramatically  Cons : Can cause performance bottleneck, for I/O intensive workloads  Procedure  Create the base OS image (with your favorite components)  Create many OS images specifying the base OS image using ‘backing_file’ (-b) #!/bin/bash START=${START:-1} END=${END:-100} for n in $(seq ${START} ${END}) do qemu-img create -b agl-19.0.0-base.img -f qcow2 -F qcow2 agl-19.0.0-${n}.img 20G done
  23. Copyright © 2025 TOYOTA MOTOR CORPORATION All rights reserved. TIP#3

    – DHCP/Connectivity Stability Issue  AGL instances failed to get IP acquisition via DHCP  connman has a limited set of DHCP functionalities  Swtiched to systemd-networkd  AGL instances eventually lost emulated 5G connections  systemd-networkd default configuration on DHCP Lease Expire was to release the IP once and acquire it again  UERANSIM 5G emulation protocol (RLS) heatbeat feature detected this.  Changed systemd-networkd configuration # cat /etc/systemd/network/50-agl.network [Match] Name=enp3s0 [Network] DHCP=yes UseDNS=yes KeepConfiguration=dhcp