PySpark Error - EOFError
I send almost the whole afternoon trying to debug an EOFError while implementing KMeansStreaming which uses a streaming THOR file as input. (code).
I found the reason why I was getting the error. There were two reasons:
- The first clue was “java.lang.IllegalArgumentException: requirement failed” – which means the training and the testing data are not of the same dimension.
- The second clue was actually the EOFError – which is actually an out of memory error. The workaround was to increase the memory allocated using the option –executor-memory.
Post a Comment