Cloud Computing Labs: Infrastructure Exploration: Using AWS EC2 and S3, and Working with Hadoop on EC2 and S3

The primary purpose of this assignment is to get familiar with the most popular Infrastructure-as-a-Service: Amazon Web Services, and to learn managing Hadoop clusters on AWS. This assignment requires a lot of activities on the command line.

Part 1: Accessing EC2 and S3 from the command line

Review the lecture about using AWS. Create your own AWS account. Note that a credit card will be used for creating the account. If you have registered AWS for the first time, you will get the free tier use of AWS for one year, which includes 750 hours of linux micro instance per month and 5GB S3 storage. Check AWS Free Tier benefits. If you have got an AWS coupon code from the class, you can redeem it on your account.

Now, answer the following questions:

Question 1.1 Have you successfully started an instance with command line? After the instance becomes stable, copy the output of the command "ec2-describe-instances instance_id" to the report.

Question 1.2 Go to ~/ec2/bin in your local directory and start a 1-date-node hadoop cluster with the hadoop-ec2 script. Login the master node and use "hadoop fsck /" to check the status of HDFS. Make sure you see "the number of data nodes" is 1. Otherwise, the cluster is not successfully setup. Where is the hadoop installation directory? Find the configuration file: hadoop-site.xml and copy the content to the report. For free tier users using newer version Hadoop, the configuraiton files are three *-site.xml files.

Question 1.3Use your own words to describe the steps in the script "~/ec2/bin/launch-hadoop-master".

Question 1.4 How much time did you spend on this task?

Question 1.5 How useful is this task to your understanding of EC2?


Part 2: Testing Hadoop on Amazon EC2 and S3

In this task, you will build a customized Amazon Machine Image (AMI) with the online tool. Then, you will test two different Hadoop setups on AWS: with the HDFS and with the S3 as the storage, to understand the advantages and disadvantages of both. Check this page for some general steps http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/creating-an-ami-instance-store.html.

2.1. Specifically, the AMI can be created following these steps.

  • Download a stable version Hadoop such as 1.2.1 from http://apache.mirrors.tds.net/hadoop/common/ and untar to directory /usr/local/hadoop_1.2.1/ use "sudo ...".
  • Configure Hadoop, setup the startup environment and networking according to the multi-node Hadoop setup tutorial.
  • Finally, bundle and upload the image to S3. You can either create the image with the AWS console, or use the following command lines.
  • Additional information: This video about setting up a customized AMI is also helpful.

    After you finish the AMI, please answer the following questions:

    Question 2.1 Copy and paste the link to your AMI here.

    Question 2.2 How much time did you spend on this task?

    Question 2.3 How useful is this task to your understanding of customized AMI?

    2.2 Now we move on to the next task: evaluating Hadoop on top of EC2 and S3. You will need to start one Master Node and 2 Data Nodes with your own Hadoop AMI. You need to configure them correctly to make sure they work as a cluster. Read this step-by-step tutorial for configuring a Hadoop cluster. Check the cluster status to make sure all the nodes and processes are working normally. If you cannot finish the task 2.1, you can still use the default Hadoop AMI to start a cluster with the HadoopEC2 script.

    After you finish these experiments, answer the following questions

    Question 3.1 List the two sets of time costs in the report, including each set's mean and variance, and write down your conclusion.

    Question 3.2 How much time did you spend on this task? Do you think it takes too much time?

    Question 3.3 How useful is this task to your understanding of customized AMI and running Hadoop on EC2/S3?

    Final Survey Questions

    After you finish all the tasks, answer the following questions.

    Question 4.1 Your level of interest in this lab exercise (high, average, low);

    Question 4.2 How challenging is this lab exercise? (high, average, low);

    Question 4.3 How valuable is this lab as a part of the course (high, average, low);

    Question 4.4 Are the supporting materials and lectures helpful for you to finish the project? (very helpful, somewhat helpful, not helpful);

    Question 4.5 How much time in total did you spend in completing the lab exercise;

    Quertion 4.6 Do you feel confident on applying the skills learned in the lab to solve other problems with AWS EC2 and S3?

    Deliverables

    Turn in the answers in one unzipped PDF file to the Pilot project submission.

    Make sure that you have terminated all instances after finishing your work! This can be easily done with the AWS web console.


    This page, first created: Nov 1 2014; last updated: Nov 14 2014