How to use REST based calls to submit spark jobs
How to use Livy:
Livy (alpha) enables programmatic, fault-tolerant, multi-tenant submission of Spark jobs from web/mobile apps which uses a REST calls to communicate with the Spark cluster.
Note, during the time of writing this blog, I have only tried Levy on a standalone PySpark setup, so I don't know the challenges involved in setting up Levy in the PySpark Cluster.
Since we would be using local file make sure to add the folder containing the pyspark scripts to parameter 'livy.file.local-dir-whitelist' of livy.conf file. Failing to do so would result in the following error:
requirement failed: Local path pi.py cannot be added to user sessions.
The command used to submit a batch script is as follows:
curl -X POST --data '{"file": "file:/home/osboxes/spark-1.6.0/examples/src/main/python/pi.py"}' -H "Content-Type: application/json" localhost:8998/batches | python -m json.tool
This command runs the spark example of pi.py. Since we are running spark in a standalone mode, Livy runs on the localhost (on port 8998).
Once the command is submitted, the state of the session is reported as starting.
To retrieve the result the following command can be used:
curl localhost:8998/batches/2/log
Note, you need to change the session id (2 in our case).
Livy (alpha) enables programmatic, fault-tolerant, multi-tenant submission of Spark jobs from web/mobile apps which uses a REST calls to communicate with the Spark cluster.
Note, during the time of writing this blog, I have only tried Levy on a standalone PySpark setup, so I don't know the challenges involved in setting up Levy in the PySpark Cluster.
Since we would be using local file make sure to add the folder containing the pyspark scripts to parameter 'livy.file.local-dir-whitelist' of livy.conf file. Failing to do so would result in the following error:
requirement failed: Local path pi.py cannot be added to user sessions.
The command used to submit a batch script is as follows:
curl -X POST --data '{"file": "file:/home/osboxes/spark-1.6.0/examples/src/main/python/pi.py"}' -H "Content-Type: application/json" localhost:8998/batches | python -m json.tool
This command runs the spark example of pi.py. Since we are running spark in a standalone mode, Livy runs on the localhost (on port 8998).
Once the command is submitted, the state of the session is reported as starting.
To retrieve the result the following command can be used:
curl localhost:8998/batches/2/log
Note, you need to change the session id (2 in our case).
Nice and good article. It is very useful for me to learn and understand easily. Thanks for sharing your valuable information and time. Please keep updating AWS Online Traning
ReplyDelete