Nalanda

November 25, 2008

Improving MapReduce Performance in Heterogeneous Environments

Filed under: Networks — Tags: , , — Ashwin @ 12:11 pm

Zaharia, M., Konwinski, A., Joseph, A. D., Katz, R. and Stoica, I. 2008. Improving MapReduce Performance in Heterogeneous Environments.

The authors examine the performance of the MapReduce implementation in Hadoop, and find several flaws in Hadoop’s scheduler which can cause severe degradation in performance.The principal problems that are analysed are those of the identification of laggard tasks for speculative execution, and the identification of nodes to which these tasks should be assigned. These problems arise in Hadoop due to assumptions of homogeneity in tasks, and in the network itself.

To remedy these problems, the authors propose a new scheduling algorithm, LATE, which is sensitive both to the variance in tasks and also to variance in node performance. LATE chooses tasks for speculative execution based on estimated time to completion, rather than the simple score metric that Hadoop uses. Only nodes with performance above a specified threshold are chosen for the execution of speculative tasks. In addition, a cap is maintained on the total number of speculative tasks that may be run at a time. Testing on various configurations of EC2, and also on a testbed with virtual machines, demonstrates that LATE provides significant performance benefits.

The one question I have is with regards to the choice of virtual machines as testbeds. While I undertstand that these could be useful to simulate a heterogeneous environment, it also seems like they are a worst case scenario. MapReduce is itself a virtualization scheme that makes certain assumptions about the locality of data; layering this on top of another virtualization scheme seems like overkill. It would be interesting to know how much of an advantage LATE delivers in a carefully planned data center, even one with a degree of heterogeneity.

Powered by WordPress