Objectives

The objective of my project is to build a connector between HPCC world and SPARK world.

Motivation
HPCC is the heart of LN business and almost all the important data is stored in the HPCC systems. Hence, if an analyst/statistician wants to build a model of (some or all) data using SPARK, she needs to download data to either her local system or move it to a cluster. This can be time-consuming and she needs to be very careful of being compliant with strict data rules.
A possible solution to this problem can be having a bridge between these two (SPARK and HPCC) worlds.

Expected Output
 We expect our output to be in form of a connector when installed can enable ECL programmers to use SPARK algorithms on data stored in HPCC as well as PySpark programmers to use HPCC data.

Github Repo 

Comments

Popular posts from this blog

How to set up a Spark multi-node cluster on AWS-EC2

How to use REST based calls to submit spark jobs