Automated Clustering of Virtual Machines based on Correlation of Resource UsagePublished online: Dec 21, 2012
The recent growth in demand for modern applications combined with the shift to the Cloud computing paradigm have led to the establishment of large-scale cloud data centers. The increasing size of these infrastructures represents a major challenge in terms of monitoring and management of the system resources. Available solutions typically consider every Virtual Machine (VM) as a black box each with independent characteristics, and face scalability issues by reducing the number of monitored resource samples, considering in most cases only average CPU usage sampled at a coarse time granularity. We claim that scalability issues can be addressed by leveraging the similarity between VMs in terms of resource usage patterns. In this paper we propose an automated methodology to cluster VMs depending on the usage of multiple resources, both systemand network-related, assuming no knowledge of the services executed on them. This is an innovative methodology that exploits the correlation between the resource usage to cluster together similar VMs. We evaluate the methodology through a case study with data coming from an enterprise datacenter, and we show that high performance may be achieved in automatic VMs clustering. Furthermore, we estimate the reduction in the amount of data collected, thus showing that our proposal may simplify the monitoring requirements and help administrators to take decisions on the resource management of cloud computing datacenters.
KeywordsCloud computing, VM Clustering, k-means, Correlation analysis
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.