Abstract: This article is an introduction to cloud computing with a focus on immediately useful hands-on practical exercises. It is taught as part of CEG2350, a course at the freshman level.
Cloud computing is all about using many computer systems that are "out there", whose physical address we do not even know. The "many" can range from just a few to thousands. Each of these systems may have TB of RAM, and peta-bytes of hard disk space.
The objectives of the associated lab experiment are to: (i) Store ten of your large files in the cloud. (ii) Create a Linux VM on a cloud computing platform and run a few commands.
The objectives of the associated lab experiment are to:
Cloud computing is all about using many computer systems that are "out there", whose physical address we do not even know. The "many" can range from just a few to thousands. Each of these systems may have TB of RAM, peta-bytes of hard disk space.
TBD High Performance Computing HPC;; High throughput computing;; Computing aimed at "beating" humans. Playing chess. Jeopardy. Voice recognition. Speech recogniztion. Dictation. ;; Elapsed time versus Total computing time.
By 2014, cloud computing will be as common as mobile phones are today.
We can run computations on remote machines, by using ssh, etc. The remote machine may not have our files, or our programs. It becomes our task to transport these.
Just like people migrate, we can think of processes migrating from one machine to another. Just like humans cannot migrate to Mars (yet?), processes can only (as of 2012) migrate from one Linux machine to another highly similar machine. Process migration involves "check-points", creating a frozen image of all their address spaces, and transporting the image to a remote machine and "thawing" it there.
In most departmental offices, there are probably dozens of PCs running screen savers. It is possible to put such idling machines to productive computing. E.g., a package known as Condor can be installed on Windows or Linux PCs that takes a description of jobs to be run and runs them on idle PCs as they become available. As soon as the "owner" of the idle PC begins an activity, the guest computation gracefully vacates and migrates to another. Condor is a standard package in Ubuntu: apt-get install condor
A group of machines can be clustered together so that they an "awareness" of each other (cf. Star Trek Borgs). The Top Ten of the world's most powerful supercomputers are Linux clusters (visit http://www.top500.org/).
Suppose we wish to discover all occurrences of given ten words in some thousand large files. This problem has an inherently parrallel solution. Subcontract the work of (i) finding w[i] in file f[j] to machine M[i,j]. Combine (ii) all the results into one. The step (i) is often called a "map"-computation and step (ii) is called a "reduce" computation. This kind of solution structure is surprisingly common in large scale ("big data") problems. The http://hadoop.apache.org/ provides an implementation of this. It is now available for both Linux and Windows.
Cloud computing name comes from the use of a cloud-shaped symbol in the infrastructure drawings of these systems. For a pretty good understanding, compare "cloud computing" with use of electric power. You plugin a heavy current device without worrying if the power generation company needs to be informed first. In a similar way, we can send a heavy duty computation into the cloud and expect to get results. Usually faster than on our own machines.
In the academic literature, cloud computing is classified as: Infrastructure as a service (IaaS), Platform as a service (PaaS), Software as a service (SaaS), Storage as a service (STaaS), Security as a service (SECaaS), Data as a service (DaaS), Test environment as a service (TEaaS), Desktop as a service (DaaS), API as a service (APIaaS), and Backend as a service (Baas).
In the practical world, cloud computing can be classified as: (i) storage of data/files in cloud, and (ii) computational clouds. Do remember the pretty obvious: Unlike the clouds that rain only on Beavercreek, Ohio, the storage/computing clouds are accessible over the Internet from everywhere.
There are many free (upto a limit) cloud storage providers. All the following provide a decent amount (5 GB?) of storage space free. Some have have integrated support with office suites, etc. All of these have client applications that run on Linux, Windows, Android and iOS.
TBD Brief descriptions.
Building your own cloud is not too hard. For personal scale (few TB, and average Internet speed) clouds, there are several hardware applicances with preloaded cloud software that can be setup easily. E.g., buy the PogoPlug (a tiny computer system priced in the range of $20-100), add hard disks, hook it up to Internet, configure the private cloud, load your files to the hard disks, and read/write files in the cloud storage from anywhere.
TBD Building local (your own) clouds. ceph. Xen
Confidentiality. Privacy. Integrity.
There are many free (upto a limit on the resources) cloud computing services. A few prominent ones are listed below. How the resources are "metered" must be learned. E.g., the resources are described using labels such as the following: Stored Data, number of Indexes, Write Operations, Read Operations, Small Operations, Channel API Calls Channels Created, Channel Hours Requested, Channel Data Sent. Note also that there is a trial period of some 30 to 90 days only for the free accounts.
Google App Engine (GAE) is a Platform as a Service (PaaS) cloud computing platform. Applications are sandboxed and run across multiple servers. App Engine offers automatic scaling (i.e., automatically allocates more resources). The site http://www.udacity.com/ (free online courses in programming and other subjects) is built using App Engine.
Windows Azure provides both Platform as a Service (PaaS) and Infrastructure as a Service (IaaS) services. Their Microsoft Online Services is a Software as a Service (SaaS). These fall into three different execution models for running applications: Web Sites, Virtual Machines, and Cloud Services.
Amazon Web Services Elastic Compute Cloud can launch instances of Linux or Windows virtual machines. "New AWS customers will be able to run a free Amazon EC2 Micro Instance and a free Amazon RDS Micro Instance for a year, while also leveraging a free usage tier for Amazon S3, Amazon Elastic Block Store, Amazon Elastic Load Balancing, and AWS data transfer. AWS's free usage tier can be used for anything you want to run in the cloud: launch new applications, test existing applications in the cloud, or simply gain hands-on experience with AWS." AWS has many services. We will be using EC2 (EC-square, Elastic Compute Cloud), which offers several free basic Amazon Machine Images (AMIs).
The following is actually very easy/quick (under a minute or two?) to do, but the description of the steps makes it appear long.
AMI: .../ebs/ubuntu-precise-12.04-amd64-server-20121001 Block Devices: sda1; Public DNS: ec2-54-234-42-210.compute-1.amazonaws.com; Private DNS: domU-12-31-39-0A-21-82.compute-1.internal; Private IPs: 10.211.34.112
You can save the AMI you created above and launch it in a later session with EC2. You will need the private key file saved above.
This section describes a few examples of heavy computations. They are heavy in that they use a lot of CPU time (hours to days) and file storage space (giga bytes).
The following are long lasting, but are inherently local computations.
In this lab, we are interested in computations heavier than the above. E.g., computing the exact value of the mathematical constant π, that shows up in formulae relating to circles, is literally unending. Even after producing, say, a trillian decimal places, we are not done.
So is the computation of square root of 2. At this page, sqrt2-1mil.html, are the first million-plus digits of the square root of 2, computed during idle time on a VAX Alpha (now defunct) class machine over the course of a weekend. They have also computed 10 million digits of the square root of 2 and thousands of digits of the Naperian exponential e.
Computing in the cloud is unimpressive unless it is based on heavy examples that can be done based on distributed computations. This is a freshman course, and we are not expecting that the technical details how these computations were designed will be understood. Scripts written by others orchestrate the computations.
All work is expected to be carried out in the Operating Systems and Internet Security (OSIS) Lab, 429 Russ. But, you are welcome to work wherever. Note that use of both Linux and Windows and other software, that may not always be installed in other facilities, may be needed.
ssh -X -i pmateti-ceg2350-aws-ec2-key.pem\ email@example.comYou are now on your virtual PC logged in as user "ubuntu" running bash. You can run any normal Debian/Ubuntu commands. E.g., (i) run the following: ps -aux; df -Th; more /etc/passwd. (ii) Download a file into the VM: wget http://www.cs.wright.edu/~pmateti/Courses/2350/Labs/Cloud/worms12.cpp (iii) What is the number of programs are already installed? (iv) Run a command of your own choice. (v) Or, even do sudo su to become root on this machine. (We could run one of our "heavy computations". But let us not strain our free account. Remember you already gave your credit card details!)
Make sure you have not left any instances of virtual PCs running in the EC2.
Link to CloudCompLab Grading Sheet