Learning Patterns Your Source for Quality Technology Courseware

Introduction to Spark 3 with Scala: Lab Setup Instructions (Linux: Your Environment)

Below are the standard requirements for this course. If you have any questions or issues, please contact us.

Important Note: Student lab files are required on each computer used for the course. The links for these are not in this lab setup, and you should receive them separately.

Other notes:

  • It’s a good idea to keep downloaded software install files on the machines during the class in case of problems that require a re-install.
  • Cloning a setup is generally not a problem. If it is, we’ll mention it in the software section (for example, much of the IBM/RAD-WAS software can be problematic in this regard).

Hardware and classroom setup.

Each student and the instructor shall have a work environment that fulfills the listed requirements.

  • RAM: 8GB recommended
  • Disk Space: Free disk space for software installs (5GB is sufficientl)
  • Operating System: Linux
    • We assume that you know how to set up and administer your Linux system.
      • We can briefly review a setup once it's done, but we do not have the resources to set up nor troubleshoot your Linux installations.
      • Note that the setup is relatively standard, with standard software packages.
    • We used Alma Linux 8.4 (has binary compatibility with RHEL). You can download it for 64-bit x86 architecture from https://mirrors.almalinux.org/isos/x86_64/8.4.html
      • Any relatively recent Linux system you are comfortable with should work - it doesn't have to be Alma 8.4
      • It must have the required software and equivalent environment setup.
      • Again, we do not have the resources to provide setup or troubleshooting support for other Linux variants. We'll do our best to help if you have questions/problems, but may not have the expertise.
    • When installing Alma 8.4, we made the following choices.
      • Software Selection : Base Environment
        • Workstation
        • Additional software: Container Management, Development Tools, and Graphical Admin Tools.
      • Disabled KDump
      • Enabled Networking
      • Root Password:
        • Set to password123
        • You can use any password you like, as long as whoever needs it (e.g. the instructor) knows what it is.
      • User Creation:
        • Created user student with password of password123
        • We assume that this user exists in our instructions. However, you can use a different user/password as long as you adjust all setup and lab instructions as they are followed to conform to your user environment.
    • Note: For specific environments (e.g. running as a virtual machine under another environment) you may need to do specific setup.
      • We assume that you know what you need to do for this, and can't support these many possible environments.
  • Recommended: Internet access
    • It's best to provide internet access to the student machines.
    • If this is not feasible for your environment, please contact us to ensure that everything works.
  • Required: Adobe Acrobat Reader
  • Required: One of either Firefox browser (https://www.mozilla.org/en-US/firefox/new/) or Chrome browser (https://www.google.com/chrome/).

Lab Files: Each student and instructor must have lab files installed (links to these files are generally sent separately via e-mail).

  • Extract the lab files to a location conveniently accessible to the student (generally the student’s home directory - in our setup this is /home/student)
  • If using a folder other than the student’s home directory, make sure that students/instructor know where they are and can freely access them.

Other instructor requirements for the classroom

  • Capability to display presentation slides or code examples.
    • For virtual environments: Generally some type of screen sharing capability.
    • For physical in-person classes:
      • Projector or large screen TV capable of 1280x800 or higher resolution. Instructor must be able to use this to project slides.
      • Whiteboard (preferred) or flip charts with markers.

Install Java Development Kit – JDK 11 (11.0.x)

  • Note that any JDK 11 version should work fine. Other close (later) Java versions may work, but have not been tested. Please contact us if you have an issue with using Java 11.
     
  • Removing existing Java and installing Java 11:
    • Many recent versions of Linux come pre-installed with Java 11. If you already have Java 11 installed, you can skip this step and can go on to "Find Java install location"
      Check if you have Java 11 by opening a terminal window, and typing the following. If you see some variation of the output that indicates you have Java 11 installed, then skip this step.

      $ java -version
      openjdk version "11.0.13" 2021-10-19 LTS

      Otherwise, you should un-install the existing Java install, and install the latest version of Java 11, which we did for our Linux version as follows.

      $ sudo yum -y remove java*
      $ sudo yum -y install java-11-openjdk-devel
      $ sudo alternatives --config java #(select the Java 11 option, usually option '2', then hit enter to save)
      $ sudo alternatives --config javac #(select the Java 11 option, usually option '2', then hit enter to save)

       
  • Continue here whether or not you had to install Java 11.
  • Find Java Install location.
    • Can be found as follows (with sample output from our system)

      $ readlink -f $(which java)
      /usr/lib/jvm/java-11-openjdk-11.0.13.0.8-1.el8_4.x86_64/bin/java

       
    • On our installation, it was under /usr/lib/jvm/java-11-openjdk-nnnn (nnnn depends on version).
  • Edit/save student user's .bash_profile to set JAVA_HOME environment variable pointing to your java install. e.g. in our install, it looked like this.

    export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-11.0.13.0.8-1.el8_4.x86_64
     
  • Open a terminal window, and test the install, as follows - with sample (first line only) of expected output

    $ java -version
    openjdk version "11.0.13" 2021-10-19 LTS
    $ javac -version
    javac 11.0.13

     
  • If this all works, you are done.

Spark-Shell Environment Setup

  • Edit/save student user's .bash_profile to set the following environment variables
  • Important Note: The below assumes that the lab setup was extracted to $HOME (which should point to the student home directory). If the lab setup was extracted elsewhere, then make sure to set SPARK_LABS to the location consistent with your environment.

export SPARK_LABS=$HOME/spark-labs-scala
export SPARK_HOME=$SPARK_LABS/spark
export KAFKA_HOME=$SPARK_LABS/kafka
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin:$KAFKA_HOME/bin
 

  • This will set up the environment appropriately to run the Spark labs.
     
  • Test the setup by opening a terminal window (logged in as the student user) and running spark-shell. We illustrate this below, with sample output.
     

$ spark-shell

... Warnings and logging omitted ...


Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://mycomputer2574:4040
Spark context available as 'sc' (master = local[*], app id = local-1655209447467).
Spark session available as 'spark'.

Welcome to

      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.0.1
      /_/
                  

Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 11.0.5)

Type in expressions to have them evaluated.

Type :help for more information.

 

scala>

  • If you see the above, then you're all done. If you see errors and don't get to the scala> prompt then you have a problem.