PySpark Error - EOFError

I send almost the whole afternoon trying to debug an EOFError while implementing KMeansStreaming which uses a streaming THOR file as input. (code).

I found the reason why I was getting the error. There were two reasons:
  • The first clue was “java.lang.IllegalArgumentException: requirement failed” – which means the training and the testing data are not of the same dimension.
  • The second clue was actually the EOFError – which is actually an out of memory error. The workaround was to increase the memory allocated using the option –executor-memory. 

Comments

Popular posts from this blog

How to set up a Spark multi-node cluster on AWS-EC2

How to use REST based calls to submit spark jobs