Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Calling CloudStack: Building a Phone Company On...

Calling CloudStack: Building a Phone Company One Zone at a Time

This talk will dive into using CloudStack as the IaaS foundation for building a world-class wireless telecommunications architecture. We'll cover the unique requirements in telephony and how CS lets us handle these in new and wonderful ways.

Evan McGee

June 24, 2013
Tweet

More Decks by Evan McGee

Other Decks in Technology

Transcript

  1. A LITTLE BACKGROUND... • In operation since 2004 • Began

    offering wireless service in late 2012 • Running production on CloudStack 3.0.2 since Day 1 • Marrying the webscale with the telescale • Machine time vs Human time. 2
  2. WHAT IS AN MVNO? Mobile Virtual Network Operator • A

    wireless company that doesn’t own it’s own spectrum or wireless infrastructure • Multiple flavors of MVNO depending on technical desire • Not necessarily tied to a particular wholesale system - can usually work with multiple partners. • Requires higher startup capital and greater amounts of technical knowledge MNO MVNO Billing CRM Accounts Marketing Service Inventory Sales Pricing 3
  3. WHAT IS AN MVNE? Mobile Virtual Network Enabler • Works

    with major network operators to enable wireless services for 3rd parties. • Takes care of user account management, provisioning, billing, taxes, etc... • Can provide service to an aggregated pool of MNOs or partner more tightly with one • Isolates MVNO to focus on core marketing/branding while maintaining focus on backend logists. MNO MVNO Billing CRM Accounts Marketing Service Inventory Sales Pricing MVNE 4
  4. THE • Equipment available from few vendors at very high

    $$$ • MVNOs limited to simply reselling service • Unique branding/customer service was the only differentiator. Historical Perspective WAY 5
  5. THE • Wireless is moving to all packet-switched networks •

    Resurgence in MVNOs due to local/niche markets • Opportunities to bring Internet services to the carrier-level • Commodity hardware + CloudStack = huge savings Today WAY 7
  6. WHY CLOUDSTACK • High code maturity and stability in 2012

    • Simple initial install and use - Ease of Convincing • Open-Source, Top-Level Apache Project • “No one ever got fired for using Apache.” - Internet • Active development / support network • Good documentation 9
  7. • Custom-built MVNE & MVNO System • Four Main Components

    (for simplicity) • Each component isolated on a guest network • Identical stacks deployed to Production, Staging, and Dev Core API Web SQL OUR 10.1.1/24 10.1.2/24 10.1.3/24 10.1.4/24 108.50.50.5 108.50.51.5 108.50.52.5 108.50.53.5 Internet MNO 11
  8. OUR • Datacenters are located in Manhattan near Tier-1 providers

    • Latency, jitter, round-trip times are important • Hosts on Centos 6.4 using KVM and Virtual Routers • Looking seriously at vSwitch • VMs are primarily Debian (with a few others for spice) 12
  9. • Very heavy network IO - each call is 87Kbps

    symmetric • Saturate a gigabit NIC at ~8-10,000 simultaneous calls • Transcoding audio/video requires high CPU • Why do we proxy the media? • Constantly monitoring the state of the calls for in- call app functionality • XMPP gateways for future WebRTC 3PCC OUR Core 14
  10. • Very heavy disk IO • Lots of writes -

    ACID compliance • Call data, billing records, account management, webside services, etc... • Lots of reads • Inbound call account lookups, least-cost routing, etc... • Experimenting with bare metal OUR SQL 15
  11. • Stateless and Lightweight, Multi-Tenanted • Two Versions - Core

    API and 3rd Party API • Each VM carries a copy of the full API • Utilizes VR load balancing to scale across instances • Monitor CPU and Memory, spin up more if necessary • Works fantastically well! OUR API 16
  12. • Typical Rails-Nginx-Unicorn Webstack • Running Spree e-commerce to accomidate

    phone sales • Ties directly into API for provisioning and account management • Allows multi-tenanted management of all MVNO functions OUR Web 17
  13. WHAT’S MOST IMPORTANT • High Availability - All of the

    9’s • Fast performance - but on a human scale • A different scale than most web services; process than take up to multiple seconds are acceptable (if undesired) • Possibly only 100,000x-1MM transactions an hour, or 8MM at one time. • Viral growth isn’t likely - although I’d be happy to be wrong! 18
  14. CONTINUOUS INTEGRATION TELCO STYLE • All code stored in Git

    - 95% unit testing, CI with Travis • Deploy code to web-based services ASAP -- usually as soon as comitted and passed tests. • Some of our core softswitch source can take a long time to build, so ignore that and only update repo code • VMs are updated by Chef for the selected roles 21
  15. CLOUDSTACK + CHEF • Chef - Used to bootstrap /

    update master images • Roles defined in the Chef cookbooks • Some of our software can take a long time to build, so ignore that and only update repo code • VMs are updated by Chef for the selected roles 22
  16. CLOUDSTACK + CHEF • Do we currently use Chef to

    update existing images? • Depends on component: API, Web -- Yes! • Core... sometimes.... • When updating softswitch components, update the Master image then recreate the stack deployment. • Destroy the old VMs as the new ones are provisioned 23
  17. CLOUDMONKEY • As of our transition to 4.1, we’ve started

    heavily using Cloudmonkey • Currently used to auto-snapshot and templatize the images created by Chef • Either Cloudmonkey or Knife-CloudStack then used by internal tools to deploy template VMs to appropriate clusters • Building Cloudmonkey into more and more internal deploy tools - keep an eye out! 24
  18. MONITORING • We need to keep a close eye on

    CPU / Memory / Storage / Network IO depending on the section • Zabbix - agentless monitoring • We usually put an agent on anyway. • Internal tools scan logs for irregularities and start the VM setup/teardown process • Scale to public cloud in an emergency 25
  19. MULTIPLE ZONES • Currently building out multiple geographic zones •

    Useful for keeping short average path length in call • Slightly limited by MNO ingress/egress points • MNO is the ultimate backup: will process all calls. • Billing is batch processed after connectivity is restored 27
  20. DATA / SERVICE BACKUP • Critical billing data currently backup

    up off-site • Possibilities of using S3 for non-critical/anonymous data • Slightly limited by MNO ingress/egress points • MNO is the ultimate backup: will process all calls. • Billing is batch processed after connectivity is restored 28
  21. UPCOMING GOALS • Hadoop Deployment • Telecom has a classic

    “big data” problem • Currently passing off data to Elastic MapReduce • Bring that in house, expose results to users/clients via API 30
  22. O T T SERVICES OVER THE TOP SERVICES In-Call Translation

    Post Call Transcription Gameification Blacklisting 31
  23. • Browser-based communications • Encrypted by default • Peer-to-Peer (or

    via proxy if desired) for both media and data • Exciting possibilities abound 33
  24. LIVE THE TELECOM FUTURE Allow people to wander and pluck

    their own data OAuth2 API for all accounts, tenant-to-user level The Data Silo becomes the Data Prairie 35