Virtual Machine Placement in Cloud Environment

Virtual Machine Placement in Cloud Environment Dharmesh Kakadia Advisor :
Prof. Vasudeva Varma Search and Information Extraction Lab International Institute of Information Technology, Hyderabad July 4, 2014 1 / 46

Introduction to Cloud and Scheduling Outline 1. Introduction to Cloud
and Scheduling 2. Dynamic SLA aware Scheduler 3. Network aware Scheduler 4. Wrap up 2 / 46

Introduction to Cloud and Scheduling Cloud Computing Cloud Computing ”Cloud
computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction” 1 1NIST Definition of Cloud Computing 3 / 46

Introduction to Cloud and Scheduling Scheduling Scheduling : History The
word scheduling is believed to be originated from a latin word schedula around 14th Century, which then meant papyrus strip, slip of paper with writing on it. In 15th century, it started to be used as mean timetable and from there was adopted to mean scheduler that we currently use in computer science. Scheduling in computing, is the process of deciding how to allocate resources to a set processes. 2 2Source : Wikipedia 4 / 46

Introduction to Cloud and Scheduling Scheduling Motivation The resource arbitration
is at the heart of the modern computers. Can not aﬀord ineﬀective resource management at cloud-scale. New challenges/opportunities due to Virtualization Consumption patterns New workloads Scheduling, it turns out, comes down to deciding how to spend money.3 3Towards a cloud computing research agenda. K. Birman et al. SIGACT’09 5 / 46

Introduction to Cloud and Scheduling Thesis Problem Scheduling In simple
notation, scheduling can be expressed as Map < VM, PM >= f (Set < VM >, Set < PM >, context) context can be Performance Model Heterogeneity of Resources Network Information 6 / 46

Introduction to Cloud and Scheduling Thesis Problem Problem How to
come up with function f ? 7 / 46

come up with function f ? That, Saves energy in data center while, maintaing SLAs Improves network scalability and performance Saves battery of mobile devices Saves cost in multi-cloud environment 8 / 46

come up with function f ? That, Saves energy in data center while, maintaing SLAs Improves network scalability and performance Saves battery of mobile devices Saves cost in multi-cloud environment 9 / 46

Dynamic SLA aware Scheduler Outline 1. Introduction to Cloud and
Scheduling 2. Dynamic SLA aware Scheduler 3. Network aware Scheduler 4. Wrap up 10 / 46

Dynamic SLA aware Scheduler Motivation ELectricity Usage by Cloud Data
Center Source : Greenpeace Dirty Cloud Report 11 / 46

Dynamic SLA aware Scheduler Motivation Server Power Characteristics 0 10
20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 Server power usage (percent of peak) Utilization (percent) Power Energy Efficiency 12 / 46

Dynamic SLA aware Scheduler Problem Goal Maintaining SLA guarantees while
eﬀectively saving the power consumed by the data center. Consolidate virtual machines eﬀectively based on the resource usage. Maximize utilization of physical machines and put them to standby mode migrating VMs on to other physical machines. 13 / 46

Dynamic SLA aware Scheduler Solution Utilization Model ResourceVector(RV ) =<
Ecpu, Emem, Edisk, Ebw > where Ex = x used by VM max x capacity of PM (1) Based on multiple resources viz. CPU, memory, disk and network as a single measure, U given as, U = α × Ecpu + β × Emem + γ × Edisk + δ × Ebw where, α, β, γ, δ ∈ [0, 1] And, α + β + γ + δ = 1 14 / 46

Dynamic SLA aware Scheduler Solution Similarity Calculation Based on Cosine
similarity Method 1 - Based on dissimilarity (lower the better) between RV of the incoming VM and RVPM. similarity = RVvm(PM) · RVPM RVvm(PM) RVPM Method 2 - Based on similarity (higher the better) between RV of the incoming VM and PMfree. similarity = RVvm(PM) · PMfree RVvm(PM) PMfree 15 / 46

Dynamic SLA aware Scheduler Solution Allocation Algorithm(VMs to be allocated)
for all VM ∈ VMs to be allocated do for all PM ∈ Running PMs do similarityPM = calculateSimilarity(RVvm(PM), RVPM) add similarityPM to queue end for sort queue ascending/descending using similarityPM for all similarityPM in queue do targetPM = PM corresponding to similarityPM if U after allocation on target PM < (Uup − buﬀer) then allocate(VM, target PM) return SUCCESS end if end for return FAILURE end for 16 / 46

Dynamic SLA aware Scheduler Solution Scale-up Algorithm 1: Scale up()
2: if U > Uup then 3: VM = VM with max U on that PM 4: Allocation Algorithm(VM) 5: end if 6: if Allocation Algorithm fails to allocate VM then 7: target PM = add a standby machine to running machine 8: allocate(VM, target PM) 9: end if 17 / 46

Dynamic SLA aware Scheduler Solution Scale-down Algorithm 1: Scale down
Algorithm() 2: if U < Udown then {if U of a PM is less than Udown} 3: Allocation Algorithm(VMs on PM) 4: end if 18 / 46

Dynamic SLA aware Scheduler Results Results : Energy and SLAs
∼ 21% energy savings ∼ 60% less SLA violations 19 / 46

Network aware Scheduler Outline 1. Introduction to Cloud and Scheduling
2. Dynamic SLA aware Scheduler 3. Network aware Scheduler 4. Wrap up 20 / 46

Network aware Scheduler Problem Network Performance in Cloud In Amazon
EC2, TCP/UDP throughput experienced by applications can ﬂuctuate rapidly between 1 Gb/s and zero. Abnormally large packet delay variations among Amazon EC2 instances. 4 4 G. Wang et al. The impact of virtualization on network performance of amazon ec2 data center. (INFOCOM’2010) 21 / 46

Network aware Scheduler Problem Scalability Scheduling algorithm has to scale
to millions of requests Network traffic at higher layers pose signifiant challenge for data center network scaling New applications in data center are pushing need for traffic localization in data center network 22 / 46

Network aware Scheduler Problem Problem VM placement algorithm to consolidate
VMs using network traﬃc patterns 23 / 46

Network aware Scheduler Problem Subproblems How to identify? - cluster
VMs based on their traﬃc exchange patterns How to place? -placement algorithm to place VMs to localize internal datacenter traﬃc and improve application performance 24 / 46

Network aware Scheduler Problem How to identify? VMCluster is a
group of VMs that has large communication cost (cij ) over time period T. 25 / 46

Network aware Scheduler Problem How to identify? VMCluster is a
group of VMs that has large communication cost (cij ) over time period T. cij = AccessRateij × Delayij AccessRateij is rate of data exchange between VMi and VMj and Delayij is the communication delay between them. 25 / 46

Network aware Scheduler Problem VMCluster Formation Algorithm AccessMatrixn×n = 
    0 c12 · · · c1n c21 0 · · · c2n . . . . . . . . . cn1 cn2 · · · 0      cij is maintained over time period T in moving window fashion and mean is taken as the value. for each row Ai ∈ AccessMatrix do if maxElement(Ai ) > (1 + opt threshold) ∗ avg comm cost then form a new VMCluster from non-zero elements of Ai end if end for 26 / 46

Network aware Scheduler Problem How to place ? Which VM
to migrate? Where can we migrate? Will the the eﬀort be worth? 27 / 46

Network aware Scheduler Solution Communication Cost Tree Each node represents
cost of communication of devices connected to it. 28 / 46

Network aware Scheduler Solution Example : VMCluster 29 / 46

Network aware Scheduler Solution Example : CandidateSet3 30 / 46

Network aware Scheduler Solution Example : CandidateSet2 31 / 46

Network aware Scheduler Solution How to place ? 32 /
46

Network aware Scheduler Solution How to place ? Which VM
to migrate? VMtoMigrate = arg max VMi |VMCluster| j=1 cij 32 / 46

to migrate? VMtoMigrate = arg max VMi |VMCluster| j=1 cij Where can we migrate? CandidateSeti (VMClusterj ) = {c | where c and VMClusterj have a common ancestor at level i} − CandidateSeti+1(VMClusterj ) 32 / 46

to migrate? VMtoMigrate = arg max VMi |VMCluster| j=1 cij Where can we migrate? CandidateSeti (VMClusterj ) = {c | where c and VMClusterj have a common ancestor at level i} − CandidateSeti+1(VMClusterj ) Will the the eﬀort be worth? PerfGain = |VMCluster| j=1 cij − cij cij 32 / 46

Network aware Scheduler Solution Consolidation Algorithm Select the VM to
migrate Identify CandidateSets Select destination PM, check if Destination will be overloaded Gain is signiﬁcant 33 / 46

Network aware Scheduler Solution Consolidation Algorithm for VMClusterj ∈ VMClusters
do Select VMtoMigrate for i from leaf to root do Form CandidateSeti (VMClusterj − VMtoMigrate) for PM ∈ candidateSeti do if UtilAfterMigration(PM,VMtoMigrate) <overload threshold AND PerfGain(PM,VMtoMigrate) > signiﬁcance threshold then migrate VM to PM continue to next VMCluster end if end for end for end for 34 / 46

Network aware Scheduler Evaluation Experimental Evaluation We compared our approach
to traditional placement approaches like Vespa [1] and previous network-aware algorithm like Piao’s approach [2]. Extended NetworkCloudSim [3] to support SDN. Floodlight The server properties are assumed to be HP ProLiant ML110 G5 (1 x [Xeon 3075 2660 MHz, 2 cores]), 4GB) connected through 1G using HP ProCurve switches. Traces from three real world data centers, two from universities (uni1, uni2) and one from private data center (prv1). 35 / 46

Network aware Scheduler Evaluation Trace Statistics Traces from three real
world data centers, two from universities (uni1, uni2) and one from private data center (prv1). Property Uni1 Uni2 Prv1 Number of Short non-I/O-intensive jobs 513 3637 3152 Number of Short I/O-intensive jobs 223 1834 1798 Number of Medium non-I/O-intensive jobs 135 628 173 Number of Medium I/O-intensive jobs 186 864 231 Number of Long non-I/O-intensive jobs 112 319 59 Number of Long I/O-intensive jobs 160 418 358 Number of Servers 500 1093 1088 Number of Devices 22 36 96 Over Subscription 2:1 47:1 8:3 36 / 46

Network aware Scheduler Results Results : Performance Improvement I/O intensive
jobs are beneﬁted most, but others also share the beneﬁt Short jobs are important for overall performance improvement 37 / 46

Network aware Scheduler Results Results : Number of Migrations Every
migration is not equally beneﬁcial 38 / 46

Network aware Scheduler Results Results : Traffic Localization 60% increase
ToR traffic (vs 30% by Piao’s approach) 70% decrease Core traffic (vs 37% by Piao’s approach) 39 / 46

Network aware Scheduler Results Results : Complexity – Time, Variance
and Migrations Measure Trace Vespa Piao’s approach Our approach Avg. scheduling Time (ms) Uni1 504 677 217 Uni2 784 1197 376 Prv1 718 1076 324 Worst-case scheduling Time (ms) Uni1 846 1087 502 Uni2 973 1316 558 Prv1 894 1278 539 Variance in scheduling Time Uni1 179 146 70 Uni2 234 246 98 Prv1 214 216 89 Number of Mi- grations Uni1 154 213 56 Uni2 547 1145 441 Prv1 423 597 96 40 / 46

Network aware Scheduler Results Conclusion Network aware placement (and traﬃc
localization) helps in Network scaling. VM Scheduler should be aware of migrations. Think rationally while scheduling, you may not want all the migrations. 41 / 46

Wrap up Outline 1. Introduction to Cloud and Scheduling 2.
Dynamic SLA aware Scheduler 3. Network aware Scheduler 4. Wrap up 42 / 46

Wrap up Recap Explored scheduling in environments where, Energy Eﬃciency
and SLAs are important Extreme heterogeneous in terms of resource capabilities and network High Network communication 43 / 46

Wrap up Future Directions Performance modeling for cloud apps Performance
predictions for different configurations (cloud/app) Combining special subsystems like storage with scheduling Study of scheduling tradeoffs 44 / 46

Thank you ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

Wrap up Related Publication 1. Dynamic Energy and SLA aware
Scheduling of Virtual Machines in Cloud Data Centers. Dharmesh Kakadia, Radheyshyam Nanduri and Vasudeva Varma. Unpublished manuscript. 2. MECCA: Mobile, Eﬃcient Cloud Computing Workload Adoption Framework using Scheduler Customization and Workload Migration Decisions. Dharmesh Kakadia, Prasad Saripalli and Vasudeva Varma. In MobileCloud ’13. 3. Energy Eﬃcient Data Center Networks - A SDN based approach Dharmesh Kakadia and Vasudeva Varma. In I-CARE’12. 4. Optimizing Partition Placement in Virtualized Environments. Dharmesh Kakadia and Nandish Kopri. Patent P13710918. 5. Network-aware Virtual Machine Consolidation for Large Data Centers. Dharmesh Kakadia, Nandish Kopri and Vasudeva Varma. In NDM collocated with SC’13. 6. MultiStack. http://MultiStack.org 46 / 46

Backup Backup Slides 1 / 39

Backup Dynamic SLA aware Discussion Scale-up/down is triggered based on
observation over a period of time, to avoid unstable behavior. Predict utilization on destination machine, to avoid SLA violation and unstable behavior. Use Buffers - to help guard against wrong decisions. Percentage (not absolute) utilization means algorithms work unchanged for heterogeneous data centers. Pick least recently used machine while scale up - all machines used uniformly - avoids hotspot. Difference between Uup and Udown should be sufficiently large to avoid jitter effect. 2 / 39

Backup Dynamic SLA aware Simulation and Algorithm Parameters Parameter Value
Scale-up Threshold, Uup [0.25, 1.0] Scale-down Threshold, Udown [0.0 to 0.4] buﬀer [0.05 to 0.5] Similarity Threshold [0, 1] Similarity Method Method 1 or 2 Number of physical machines 100 Speciﬁcations of physical machines Heterogeneous Time period for which resource usage of VM is logged for exact RVvm calculation, ∆ 5 minutes 3 / 39

Backup Dynamic SLA aware Results : Eﬀect of Uup Uup
should not be too high or too low (optimal around 0.70-0.80) high Uup means a lot more SLA violation If Uup is low, Scale-up algorithm will run more than necessary machines. 4 / 39

Backup Dynamic SLA aware Results : Effect of buffer Buffer
has benefits Keep buffer only what is required Beware of too high values, will lead to less consolidation 5 / 39

Backup Dynamic SLA aware Results : Eﬀect of scale down
50% energy savings 6 / 39

Backup Dynamic SLA aware Results : SLA : Similarity or
Dissimilarity Similarity is better than dissimilarity 7 / 39

Backup Dynamic SLA aware The variance in delay as number
of ﬂows grows 8 / 39

Backup Dynamic SLA aware Consolidation Algorithm 1: Update traﬃc metrics
using SDN counters 2: for each Switch s in S such that Utilization(s) ¡ threshold θ over time t do 3: if canMigrate(s, S-s)) then 4: pFlows = prioritizeFlows(s) 5: incrementalMigration (pFlows) 6: Poweroﬀ (s) 7: end if 8: end for 9 / 39

Backup Dynamic SLA aware Simulation Setup Parameter Value Number of
Hosts 2000 Number of Edge Switches 100 Topology FatTree Link Capacity 100 MBPS Switch booting time 90 sec Number of Ports per Switch 24 10 / 39

Backup Dynamic SLA aware Results : # switches required Numb
of active switches as the number of Flows grows almost linearly 11 / 39

Backup Dynamic SLA aware The variance in delay as number
of ﬂows grows 12 / 39

Backup Mobile Scheduler Current Mobile Cloud Landscape By 2016, 40%
of Mobile apps will use cloud back-end services. 5 cloud-enabled Apps Dropbox, Evernote, Instagram, ... Siri, Google Voice, ... Kindle, ... Traditional Apps GIMP Firefox Games 5http://www.gartner.com/newsroom/id/2463615 13 / 39

Backup Mobile Scheduler Mobile Cloud Opprtunity Mobile devices are becoming
powerful, but rich applications are more and more hungry for resources. Cloud has inﬁnite resources. Cloud is programmable. Always ON. Only a handful apps are leveraging cloud. 14 / 39

Backup Mobile Scheduler Motivation Observation : Many apps are not
cloud-aware, but can be migrated. Can we create a Mobile cloud framework that leverage cloud resources, Without making app cloud-aware Without annoying user Adaptive Personalized Works autopilot mode 15 / 39

Backup Mobile Scheduler Environment & Assumptions 16 / 39

Backup Problem Environment & Assumptions When to oﬄoad application to
cloud? 17 / 39

Backup Problem Workﬂow : App launch Monitoring Tools (Perf,..) Monitoring
Information App 18 / 39

Backup Problem Workﬂow : Oﬄoad Decision Voppal_wabiit Model Monitoring Tools
(Perf,..) Monitoring Information App Offload Decision 19 / 39

Backup Problem Workﬂow : Initiating Migration Cloud Mobile Voppal_wabiit Model
Monitoring Tools (Perf,..) Monitoring Information Offload Decision Initiate Migration Yes App OpenStack API VM VNC Server 20 / 39

Backup Problem Workﬂow : Remoting Cloud Mobile Voppal_wabiit Model Monitoring
Tools (Perf,..) Monitoring Information Offload Decision Initiate Migration Yes App OpenStack API VNC Viewer VM VNC Server 21 / 39

Backup Solution Offloading Decision if Gainp ≥ significance threshold then
Execute the p remotely on cloud. else continue executing p locally. end if significance threshold controls aggressiveness 22 / 39

Backup Solution Performance Gain Feature Gain, fi = (mi −
ci ) mi mi : cost of running the application on mobile device (0 – 1) ci : cost of running the application on cloud device (0 – 1) Performance Gain, Gain = (wi × fi ) wi wi : weight of i the feature gain, normalized to unity 23 / 39

Backup Solution Learning Algorithm Gain as regression problem with squared
loss function learned in an online setting Used vowpal wabbit 6 : fast online learning toolkit Features : High level features App features Network features Other Apps Device static features Cloud provider features 6 https://github.com/JohnLangford/vowpal_wabbit/ 24 / 39

Backup Solution Dynamic Features High level features : comprise of
features that are concerned to user. Includes battery status, date and time, user location (moving/stable), etc. Application features : capture application usage habits including frequency of usage of the application, stretch of usage, use of local and remote data, etc. Network Status : network condition between cloud and mobile device. Includes bandwidth, latency and stability. Resource usage by other applications running on device : combined vector of all individual applications. 25 / 39

Backup Solution Non-Dynamic Features Device Configuration : capture all the
hardware and software configuration of the device. cpu frequency cpu power steps operating frequency, etc. Cloud Configuration: This captures characteristics of the cloud provider. monetary cost provider performance statistics 26 / 39

Backup Experiments Evaluation A virtual machine running android as a
mobile device Linux traffic control utility (tc) is used to simulate various network condition Used OpenStack as IaaS cloud provider Property Value Cloud Operating System Ubuntu 12.04(kernel 3.2) Cloud VM configuration 4 GB, 2.66GHz Device Operating System Android 4.2 Device Configuration 1GB, 1.5 GHz 27 / 39

Backup Experiments Workloads Representative of normal user interaction Applications with
varying resource utilization and duration On varying Network speed : cable(0.375/6), DSL(0.75/3) and EVDO(1.2/3.8) Workload Description Characteristics Kernel kernel download + build long + resource intensive GIMP Image editing + applying image ﬁlters interactive + little intensive Video conversion download & convert a (500MB) video short + resource intensive Browser browsing 5 sites interactive 28 / 39

Backup Results Results : Decision and Time taken 29 /
39

Backup Results Results : Overhead Measured as % increase in
the resource utilization with and without running our system. Overhead between 4–7 % 30 / 39

Backup Results Conclusion A Mobile cloud scheduler that is Context-aware
Adaptive to various workloads automatically Personalized Easy to use and uses learning algorithm for system optimization 31 / 39

Backup Network-aware Results : Sensitivity to parameters After 0.6, traffic
pattern controls #VMCluster All the improvements will be discarded as insignificant if significance threshold is very high 32 / 39

Backup MultiStack Problem Cloud market place is fragment. Very little
(and only superficial) inter-operability. Each cloud is very different (Architecture/SLA/Abstraction/API/...). Likely to stay like this, due to conflict of interests. Can lead to lock-in, Data-loss, Cost increase. Many new applications have bursty nature. 33 / 39

Backup MultiStack MultiStack : Multi Cloud Big Data Research Platform
Think as OS for Multiple Clouds. To identify problems and evaluate solutions to multicloud platform. More challenging than data center scheduling. Big data as the ﬁrst use case. 34 / 39

Backup MultiStack Overview MultiCloud : Ability to use resources from
multiples clouds seamlessly. 35 / 39

Backup MultiStack MultiStack : Services Resource Management Migration Monitoring Identity
and Authentication Data Management Billing 36 / 39

Backup MultiStack MultiStack : Architecture 37 / 39

Backup MultiStack Progress so far Base Platform Simple capacity based
scheduler Provisioning on AWS and OpenStack Deployment Hadoop clusters Manual scaling of clusters 38 / 39

Backup MultiStack Immediate features in pipeline Auto Scaling Ability to
run across multiple cloud providers Priority based Job scheduling for minimizing cost and completion time Performance optimization with storage integration Client Tools More frameworks (Spark, Hive, Pig, Oozie, Drill, MLlib,..) Other Schedulers (Autoscaling, Spot-instances, Job proﬁle based) 39 / 39

Virtual Machine Placement in Cloud Environment

Virtual Machine Placement in Cloud Environment

More Decks by dharmeshkakadia

Other Decks in Technology

Featured

Transcript