Calling CloudStack: Building a Phone Company One Zone at a Time
This talk will dive into using CloudStack as the IaaS foundation for building a world-class wireless telecommunications architecture. We'll cover the unique requirements in telephony and how CS lets us handle these in new and wonderful ways.
offering wireless service in late 2012 • Running production on CloudStack 3.0.2 since Day 1 • Marrying the webscale with the telescale • Machine time vs Human time. 2
wireless company that doesn’t own it’s own spectrum or wireless infrastructure • Multiple flavors of MVNO depending on technical desire • Not necessarily tied to a particular wholesale system - can usually work with multiple partners. • Requires higher startup capital and greater amounts of technical knowledge MNO MVNO Billing CRM Accounts Marketing Service Inventory Sales Pricing 3
with major network operators to enable wireless services for 3rd parties. • Takes care of user account management, provisioning, billing, taxes, etc... • Can provide service to an aggregated pool of MNOs or partner more tightly with one • Isolates MVNO to focus on core marketing/branding while maintaining focus on backend logists. MNO MVNO Billing CRM Accounts Marketing Service Inventory Sales Pricing MVNE 4
Resurgence in MVNOs due to local/niche markets • Opportunities to bring Internet services to the carrier-level • Commodity hardware + CloudStack = huge savings Today WAY 7
• Simple initial install and use - Ease of Convincing • Open-Source, Top-Level Apache Project • “No one ever got fired for using Apache.” - Internet • Active development / support network • Good documentation 9
(for simplicity) • Each component isolated on a guest network • Identical stacks deployed to Production, Staging, and Dev Core API Web SQL OUR 10.1.1/24 10.1.2/24 10.1.3/24 10.1.4/24 108.50.50.5 108.50.51.5 108.50.52.5 108.50.53.5 Internet MNO 11
• Latency, jitter, round-trip times are important • Hosts on Centos 6.4 using KVM and Virtual Routers • Looking seriously at vSwitch • VMs are primarily Debian (with a few others for spice) 12
symmetric • Saturate a gigabit NIC at ~8-10,000 simultaneous calls • Transcoding audio/video requires high CPU • Why do we proxy the media? • Constantly monitoring the state of the calls for in- call app functionality • XMPP gateways for future WebRTC 3PCC OUR Core 14
API and 3rd Party API • Each VM carries a copy of the full API • Utilizes VR load balancing to scale across instances • Monitor CPU and Memory, spin up more if necessary • Works fantastically well! OUR API 16
9’s • Fast performance - but on a human scale • A different scale than most web services; process than take up to multiple seconds are acceptable (if undesired) • Possibly only 100,000x-1MM transactions an hour, or 8MM at one time. • Viral growth isn’t likely - although I’d be happy to be wrong! 18
- 95% unit testing, CI with Travis • Deploy code to web-based services ASAP -- usually as soon as comitted and passed tests. • Some of our core softswitch source can take a long time to build, so ignore that and only update repo code • VMs are updated by Chef for the selected roles 21
update master images • Roles defined in the Chef cookbooks • Some of our software can take a long time to build, so ignore that and only update repo code • VMs are updated by Chef for the selected roles 22
update existing images? • Depends on component: API, Web -- Yes! • Core... sometimes.... • When updating softswitch components, update the Master image then recreate the stack deployment. • Destroy the old VMs as the new ones are provisioned 23
heavily using Cloudmonkey • Currently used to auto-snapshot and templatize the images created by Chef • Either Cloudmonkey or Knife-CloudStack then used by internal tools to deploy template VMs to appropriate clusters • Building Cloudmonkey into more and more internal deploy tools - keep an eye out! 24
CPU / Memory / Storage / Network IO depending on the section • Zabbix - agentless monitoring • We usually put an agent on anyway. • Internal tools scan logs for irregularities and start the VM setup/teardown process • Scale to public cloud in an emergency 25
Useful for keeping short average path length in call • Slightly limited by MNO ingress/egress points • MNO is the ultimate backup: will process all calls. • Billing is batch processed after connectivity is restored 27
up off-site • Possibilities of using S3 for non-critical/anonymous data • Slightly limited by MNO ingress/egress points • MNO is the ultimate backup: will process all calls. • Billing is batch processed after connectivity is restored 28