CEG 2350: OS Concepts and Usage

Introduction to Cloud Computing

Abstract: This article is an introduction to cloud computing with a focus on immediately useful hands-on practical exercises. It is taught as part of CEG2350, a course at the freshman level.

Cloud computing is all about using many computer systems that are "out there", whose physical address we do not even know. The "many" can range from just a few to thousands. Each of these systems may have TB of RAM, and peta-bytes of hard disk space.

The objectives of the associated lab experiment are to: (i) Store ten of your large files in the cloud. (ii) Create a Linux VM on a cloud computing platform and run a few commands.

Table of Contents

  1. What is Cloud Computing?
  2. Distributed Computing Models
  3. Well-Known Storage Clouds
  4. Well-Known Computing Clouds
  5. Examples of Heavy Computations
  6. Lab Experiment
  7. Acknowledgements
  8. References

Educational Objectives

The objectives of the associated lab experiment are to:

  1. Store ten of your large files in the cloud.
  2. Create a Linux VM on a cloud computing platform and run a few commands.

What is Cloud Computing?

Cloud computing is all about using many computer systems that are "out there", whose physical address we do not even know. The "many" can range from just a few to thousands. Each of these systems may have TB of RAM, peta-bytes of hard disk space.

TBD High Performance Computing HPC;; High throughput computing;; Computing aimed at "beating" humans. Playing chess. Jeopardy. Voice recognition. Speech recogniztion. Dictation. ;; Elapsed time versus Total computing time.

By 2014, cloud computing will be as common as mobile phones are today.

Distributed Computing Models

Remote Computing

We can run computations on remote machines, by using ssh, etc. The remote machine may not have our files, or our programs. It becomes our task to transport these.

Process Migration

Just like people migrate, we can think of processes migrating from one machine to another. Just like humans cannot migrate to Mars (yet?), processes can only (as of 2012) migrate from one Linux machine to another highly similar machine. Process migration involves "check-points", creating a frozen image of all their address spaces, and transporting the image to a remote machine and "thawing" it there.

Using Idle Computers

In most departmental offices, there are probably dozens of PCs running screen savers. It is possible to put such idling machines to productive computing. E.g., a package known as Condor can be installed on Windows or Linux PCs that takes a description of jobs to be run and runs them on idle PCs as they become available. As soon as the "owner" of the idle PC begins an activity, the guest computation gracefully vacates and migrates to another. Condor is a standard package in Ubuntu: apt-get install condor

Cluster Computing

A group of machines can be clustered together so that they an "awareness" of each other (cf. Star Trek Borgs). The Top Ten of the world's most powerful supercomputers are Linux clusters (visit http://www.top500.org/).

Map/Reduce Hadoop

Suppose we wish to discover all occurrences of given ten words in some thousand large files. This problem has an inherently parrallel solution. Subcontract the work of (i) finding w[i] in file f[j] to machine M[i,j]. Combine (ii) all the results into one. The step (i) is often called a "map"-computation and step (ii) is called a "reduce" computation. This kind of solution structure is surprisingly common in large scale ("big data") problems. The http://hadoop.apache.org/ provides an implementation of this. It is now available for both Linux and Windows.

Cloud Computing

Cloud computing name comes from the use of a cloud-shaped symbol in the infrastructure drawings of these systems. For a pretty good understanding, compare "cloud computing" with use of electric power. You plugin a heavy current device without worrying if the power generation company needs to be informed first. In a similar way, we can send a heavy duty computation into the cloud and expect to get results. Usually faster than on our own machines.

In the academic literature, cloud computing is classified as: Infrastructure as a service (IaaS), Platform as a service (PaaS), Software as a service (SaaS), Storage as a service (STaaS), Security as a service (SECaaS), Data as a service (DaaS), Test environment as a service (TEaaS), Desktop as a service (DaaS), API as a service (APIaaS), and Backend as a service (Baas).

In the practical world, cloud computing can be classified as: (i) storage of data/files in cloud, and (ii) computational clouds. Do remember the pretty obvious: Unlike the clouds that rain only on Beavercreek, Ohio, the storage/computing clouds are accessible over the Internet from everywhere.

Well-Known Storage Clouds

There are many free (upto a limit) cloud storage providers. All the following provide a decent amount (5 GB?) of storage space free. Some have have integrated support with office suites, etc. All of these have client applications that run on Linux, Windows, Android and iOS.

TBD Brief descriptions.

  1. Amazon Cloud Drive
  2. Apple iCloud
  3. Box
  4. DropBox
  5. Google Drive as a storage cloud combines with Google Docs, a software as a service office suite, that permits real time collaborative editing.
  6. SkyDrive is integrated with Microsoft Office.
  7. SugarSync
  8. Symform
  9. Ubuntu One
  10. Are gmail, hotmail, ... storage clouds? (There are client programs that treat them so.)

Private Clouds

Building your own cloud is not too hard. For personal scale (few TB, and average Internet speed) clouds, there are several hardware applicances with preloaded cloud software that can be setup easily. E.g., buy the PogoPlug (a tiny computer system priced in the range of $20-100), add hard disks, hook it up to Internet, configure the private cloud, load your files to the hard disks, and read/write files in the cloud storage from anywhere.

TBD Building local (your own) clouds. ceph. Xen

Issues

Confidentiality. Privacy. Integrity.

Well-Known Computing Clouds

There are many free (upto a limit on the resources) cloud computing services. A few prominent ones are listed below. How the resources are "metered" must be learned. E.g., the resources are described using labels such as the following: Stored Data, number of Indexes, Write Operations, Read Operations, Small Operations, Channel API Calls Channels Created, Channel Hours Requested, Channel Data Sent. Note also that there is a trial period of some 30 to 90 days only for the free accounts.

Google App Engine (GAE) is a Platform as a Service (PaaS) cloud computing platform. Applications are sandboxed and run across multiple servers. App Engine offers automatic scaling (i.e., automatically allocates more resources). The site http://www.udacity.com/ (free online courses in programming and other subjects) is built using App Engine.

Windows Azure provides both Platform as a Service (PaaS) and Infrastructure as a Service (IaaS) services. Their Microsoft Online Services is a Software as a Service (SaaS). These fall into three different execution models for running applications: Web Sites, Virtual Machines, and Cloud Services.

AWS EC2

Amazon Web Services Elastic Compute Cloud can launch instances of Linux or Windows virtual machines. "New AWS customers will be able to run a free Amazon EC2 Micro Instance and a free Amazon RDS Micro Instance for a year, while also leveraging a free usage tier for Amazon S3, Amazon Elastic Block Store, Amazon Elastic Load Balancing, and AWS data transfer. AWS's free usage tier can be used for anything you want to run in the cloud: launch new applications, test existing applications in the cloud, or simply gain hands-on experience with AWS." AWS has many services. We will be using EC2 (EC-square, Elastic Compute Cloud), which offers several free basic Amazon Machine Images (AMIs).

Create and Launch our VM

The following is actually very easy/quick (under a minute or two?) to do, but the description of the steps makes it appear long.

  1. [AWS EC2 Signup] Sign up for a free account on AWS ( http://aws.amazon.com/) and susbcribe to EC2 service. This is similar to signing up for a shopping account. You will be asked to provide a credit card number. A verification code will be printed on the we page, which you enter on your (cell) phone. Our usage in this Lab will be within the free (i.e., no cost) tier. More on this during the lecture.
  2. Invoke a web browser and navigate on AWS site: MyAccount/Console -- ManagementConsole -- EC2 -- LaunchInstance. This brings usto a wizards page. We can now choose one of many ready-to-use Linux/Windows VM images. On our PCs in the OSIS Lab we use Ubuntu 12.04 64-bit; so, let us select that AMI. You can select an Avalability Zone; we are in OH, so leave it as No Preference. Continue.
  3. This brings us to Advanced Instance Options. Leave all choices at their defaults. Continue.
  4. This brings us to Storage Device Configuration. It will typically show a virtual HD partition /dev/sda1 of 8 GB and an "ephemeral" disk /dev/sdb. We can edit these details; but for now, Continue.
  5. This brings us to a page on Adding tags to your instance. Use your full name for the Name, and "CEG2350-WSU-Cloud-Lab" for the Value. Continue.
  6. This brings us to a page on Create a Public-Private Key pair. Create and download. You will receive a file with a .pem extension. This is the RSA private key file that you should safeguard on your local machine: chmod 400 ~/CloudLab/pmateti-ceg2350-aws-ec2-key.pem. Continue.
  7. This brings us to a page on Create a new Security group (Configure Firewall). Leave the rules at their default. Do not delete the SSH rule. Continue.
  8. This brings us to a summary page showing how the VM that we are about to create is provisioned. Launch. Read the page that shows up. Close.
  9. You will now see the Amazon EC2 Console again. Refresh My Resources. You should see 1 running instance, etc. Click on the "running instance". You will be presented with the details; e.g.,
    AMI: .../ebs/ubuntu-precise-12.04-amd64-server-20121001
    Block Devices:   sda1;
    Public DNS: ec2-54-234-42-210.compute-1.amazonaws.com;
    Private DNS: domU-12-31-39-0A-21-82.compute-1.internal;
    Private IPs: 10.211.34.112

You can save the AMI you created above and launch it in a later session with EC2. You will need the private key file saved above.

Examples of Heavy Computations

This section describes a few examples of heavy computations. They are heavy in that they use a lot of CPU time (hours to days) and file storage space (giga bytes).

The following are long lasting, but are inherently local computations.

  1. Discover how many times your last name appears in all the files on your system.
  2. What is the oldest file on your system?
  3. What is the total number of files on your system?

In this lab, we are interested in computations heavier than the above. E.g., computing the exact value of the mathematical constant π, that shows up in formulae relating to circles, is literally unending. Even after producing, say, a trillian decimal places, we are not done.

So is the computation of square root of 2. At this page, sqrt2-1mil.html, are the first million-plus digits of the square root of 2, computed during idle time on a VAX Alpha (now defunct) class machine over the course of a weekend. They have also computed 10 million digits of the square root of 2 and thousands of digits of the Naperian exponential e.

Computing in the cloud is unimpressive unless it is based on heavy examples that can be done based on distributed computations. This is a freshman course, and we are not expecting that the technical details how these computations were designed will be understood. Scripts written by others orchestrate the computations.

Lab Experiment

All work is expected to be carried out in the Operating Systems and Internet Security (OSIS) Lab, 429 Russ.   But, you are welcome to work wherever.  Note that use of both Linux and Windows and other software, that may not always be installed in other facilities, may be needed.

  1. [Cloud Storage] (i) Sign up for a free account at two or more cloud storage providers of your choice. (You may have to give them your credit card number. The names given above are trusted (so far?) and have not charged free account users as long as the free account conditions are observed.) Do thi sin either Linux or Windows -- your choice. (ii) Choose any ten of your files that are larger than 1 MB each. (Do not have any? Go download some legitimately free mp3 files.) Store these files in both the providers above. Show a "directory listing" in each.
  2. [Launch your VM in the EC2 Cloud] Launch the VM you create(d) on Amazon EC2. See details above. Make a note of the "Public DNS" of your VM; in our example it was ec2-54-234-42-210.compute-1.amazonaws.com.
  3. [Examine Logs] Our VM is now running on Amazon EC2. No users logged in yet. There are several Instance Actions we can take. Let us Get System Log. You will see output similar to what you find in /var/log/dmesg or /var/log/messages.
  4. [Connect to our VM using ssh] Let us take Instance Action of Connect. We can now connect to this virtual PC using the fully qualified domain name of our VM noted above. On the local Linux PC, open a new konsole window. Run the following:
    ssh -X -i pmateti-ceg2350-aws-ec2-key.pem\
     ubuntu@ec2-54-234-42-210.compute-1.amazonaws.com
    You are now on your virtual PC logged in as user "ubuntu" running bash. You can run any normal Debian/Ubuntu commands. E.g., (i) run the following: ps -aux; df -Th; more /etc/passwd. (ii) Download a file into the VM: wget http://www.cs.wright.edu/~pmateti/Courses/2350/Labs/Cloud/worms12.cpp (iii) What is the number of programs are already installed? (iv) Run a command of your own choice. (v) Or, even do sudo su to become root on this machine. (We could run one of our "heavy computations". But let us not strain our free account. Remember you already gave your credit card details!)
  5. [X11 Clients] The -X flag in the ssh command above established a tunnel for X11 traffic between the X11 server running on our local PC and the X11 clients you may run on EC2 VM. Invoke any two X11 programs of your choice on the VM. Their windows will pop up on our local PC screen.
  6. [Further Use of the VM] Compile the worms12.cpp downloaded above. This is a simple program displaying wiggly worms. It was written for old PCs and needs some more fixes. Compile it as follows: g++ -c worms12.cpp; ls -l worms* Link as follows: < g++ worms12.o -l ncurses -o worms12.
  7. Learn on your own to transfer a file of yours from the cloud storage to the EC2 virtual PC.
  8. Let us take Instance Action of Terminate. You will see, on the local ssh console, the familiar "The system is going down for power off NOW!" etc. Go back to EC2 Dashboard. Refresh. You will now have 0 Running Instances. You can now sign out.
  9. [50 Bonus Points] [Download the latest Linux Kernel and Compile] This requires 30+ GB of disk space. Make sure to delete the download and the files built as soon as possible. Also, inform us that you are attempting this bonus point item.
  10. [100 Bonus Points] [Download the Android Jelly Bean and Build a ROM] The above remarks apply also to this item.

Make sure you have not left any instances of virtual PCs running in the EC2.

Turnin

  1. In this Lab, we are no longer that explicit about what paragraphs to include in myLabJournal.txt or answers.txt. Use good judgment.
  2. Trim the myLabJournal.txt file into answers.txt keeping material that is directly relevant to the items above.
  3. Note the number <n> of this Lab from the course home page and use L<n> as the first argument to turnin . Turn in the files called answers.txt, myLabJournal.txt, and the usual ReadMe.txt as explained in Expectations.

Link to CloudCompLab Grading Sheet

Acknowledgements

References

  1. Amazon, AWS EC2, http://aws.amazon.com/free/ 2012. Required visit.
  2. Google, App Engine, https://appengine.google.com 2012. Required visit.
  3. Microsoft, Windows Azure, https://www.windowsazure.com/. 2012. Required visit.

Copyright © 2012 Prabhaker Mateti