1. Install Python 2.7
● Install pyhon 2.7.10
– https://www.python.org/downloads/
– Mac: https://www.python.org/ftp/python/2.7.10/python-2.7.10-macosx10.6.pkg
– Or using MacPort
● sudo port install python27
● https://astrofrog.github.io/macports-python/
– Windows: https://www.python.org/ftp/python/2.7.10/python-2.7.10.amd64.msi
– Ubuntu: sudo apt-get install python-all

2. Install pydev Plugin for Eclipse
● PyDev is
● http://www.pydev.org/manual_101_install.html
● Use this Eclipse Update Site:
– http://pydev.org/updates

3. Spark
● Download Spark Package from
http://spark.apache.org/downloads.html Choose a package
type: pre-built package for Hadoop 2.7 or later
● Unpack the file spark-2.2.1-bin-hadoop2.7.tgz

**** very important ****
the solution for this error.
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries

  1. Download the executable winutils from the Hortonworks repository.
  2. move wintutils.exe to spark-2.2.1-bin-hadoop2.7/bin folder

4. Configure Eclipse PyDev to use Spark Libs
● Add SPARK_HOME variable to pyDev
● Preferences –> PyDev → Python Interpreter -> Environment
– Add SPAR_HOME and variable is path to your spark dir


5. Configure Eclipse PyDev to use Spark Libs
● Add SPARK library files (zip files) from spark-1.5.0-binhadoop2.6/
python/lib to your project