UP | HOME
2017-02-20 ../../

P4: Java8 Streams, Map+Reduce

Table of Contents

1 Lab Goals

The purpose of this lab is (i) to get you started with technologies that are essential in cloud computing, (ii) to give you more experience in developing Android APKs, (iii) to familiarize elementary use of github and related utilities, (iv) apply all the above in a concrete instances of APKs.

This lab/project is a programming assignment. This P4 tries to be compact by omitting repeated descriptions of deliverables. See the Turnin section.

2 Background

  1. Lecture Notes on Java8 lambda expressions, streams, map, reduce, concurrency.
  2. Lecture Notes on Cloud.
  3. Lecture Notes on APK development.

3 Tasks

3.1 Task: Java-8 Streams in Linux or Windows

  1. You will be developing a program using Java8 streams to analyze logs. Here is an example input data file: auth.log.txt, a copy of a /var/log/auth.log, produced by a Linux system. It shows attempted logins from internet. Such lines are of the form Invalid user ubnt from 113.140.37.138 We wish to extract such info. The resulting output data file should be invalidUsers.txt. Obviously, this can be done without streams. But you must use Java8 streams.
  2. This task is to be done in Linux. Less preferred: in Windows. Suggested main() method:
    public static void main(String[] args) throws IOException {
      String ifnm = "/var/log/auth.log", ofnm = "/tmp/invalidUsers.txt";
      switch (args.length) {
      case 0: break;
      case 2: ofnm = args[ 1]; // fall through
      case 1: ifnm = args[ 0]; break;
      default:
         System.out.println
           ("Usage: At most two file names expected");
           System.exit(0);
      }
      ...
    }
    
  3. Deliverables: (i) The src code files, copy of the input and output files, (ii) A status report (How well is your program working? bugs? crashes? hangs? …), (iii) a summary paragraph of how you developed this program, and experience report. Do not submit .class files. See Turnin for what goes where.

3.2 Task: Java-8 Streams in an Android APK

  1. Redo the above task, but now dress it up and revise it as a proper Android APK. Include scrolling of input and output files as selectable activities.
  2. Deliverables: (i) Screenshots (4+) produced by running the APK you built, (ii) The src code files, (iii) A status report. (iv) experience report.

3.3 Task: Map-Reduce in Linux/Windows

  1. You must use Java8 streams and map + reduce on AWS cloud to solve this task.
  2. You are given the titles of a number of books at http://www.gutenberg.org/, and a non-negative integer n. Find the n most common words among these books. The titles are given in a text file whose path name is given as args[ 0], and the n words should be output into a a text file whose path name is args[ 1].
  3. You are given several http or https URLs of a number of web pages, and a non-negative integer n. Assume these web pages are mostly text. Ignore all non-text materials. Find the n most common words among these pages. The titles are given in a text file whose path name is given as args[ 0], and the n words should be output into a a text file whose path name is args[ 1]. The n is given as args[ 2].
  4. Use https://docs.oracle.com/javase/8/docs/api/java/util/Scanner.html
  5. Figure out all the missing details. E.g., how to download a text file of a book from Gutenberg web page through your Java program. Use a main() method similar to the above.
  6. Deliverables: (i) The src code files, (ii) example pairs of input-outputs. (iii) A status report, (iv) experience report,

3.4 Task: Map-Reduce in Android

  1. Redo the above task, but now dress it up and revise it as a proper Android APK.
  2. Include two settings: one for setting n, and another to display the output on the screen, with a scroll bar.
  3. Deliverables: (i) Screenshots (4+) produced by running the APK you built, (ii) example pairs of input-outputs. (iii) The src code files, (iv) A status report, (v) experience report.

3.5 Task: Java8 Concurrency

  1. Redo the task named "Java-8 Streams in an Android APK", but now using as much concurrency as possible. E.g., (i) use parallel streams, and (ii) use threads. It is recommended that you do this first in Linux and then in Android.
  2. Deliverables: (i) Screenshots (4+) produced by running the APK you built, (ii) example pairs of input-outputs, (iii) The src code, (iv) A status report, (v) experience report.

3.6 Task: [Optional Bonus] Apache Spark/ Java in Linux/Windows

  1. Apache Spark is now (2015+) the preferred system to use for problems that were typically solved using Hadoop. Spark is written in Scala, but can be used from within Java or Python programs also.
  2. Rewrite the "Task: Map-Reduce in Linux/Windows" using Spark + Java.
  3. Deliverables: As above. (i) Screenshots (4+) produced by running the APK you built, (ii) (iii) (iv) The src code files, (v) A status report (v) experience report.

4 TurnIn

  1. Scripts are used to check various things – so file names should obey "rules". E.g., include {in the report} is short for include {in the report-P4.pdf file}. The report must have page numbers.
  2. In the report, at the very beginning: Include (i) your full name, (ii) WSU UID, (iii) your GitHub link, (iv) the lab/ assignment/ project heading.
  3. In the report, make a section heading for each task. At the top of each section, list the deliverables you have included and where (page number, URL, …).
  4. What else should be included in the report? Assume a reader other than yourself. Assume that the tasks description is read. Include in the report all relevant things so that such a person can readily understand and even judge your effort. E.g., the following should always be included: Relevant screenshots, pseudo-code design explanations, assertions, example input-output pairs, etc. A status report (How well is your program working? bugs? crashes? hangs? …) should always be present. Experience Report: {A summary of your experience with the assignment and a critique are helpful. May be half a page?} Src code listings do not belong in the report. However, you may copy-paste short segments to explain.
  5. The (i) pre- post-conditions, (ii) highlights of diffs, if any, (iii) status report go in the report, and also as JavaDoc comments in the src code files. In this P4, write the pre- post conditions for the important method that main() calls below the switch statement (see code above).
  6. APKs, src code files, complete diffs (not just highlights), … should be uploaded (between due +2 and due +3 days) to your GitHub (or similar) account. Do not zip or tar-ball these. Do create a decent directory structure.
  7. Submit report-P4.pdf on Pilot dropbox for P4 on or before the due date.

5 End


Copyright © 2017 Dr Prabhaker Mateti • 2017-02-20