Common spark exception: Java util. concurrent. TimeoutException: Futures timed out

Big data learning monk 2022-02-13 07:15:56 阅读数:751

common spark exception java util.

perform spark on yarn The task times wrong :

Caused by : java.util.concurrent.TimeoutException: Futures timed out after 1000s

Refer to this website

This happens because Spark tries to do Broadcast Hash Join and one of the DataFrames is very large, so sending it consumes much time.
You can:
Set higher spark.sql.broadcastTimeout to increase timeout - spark.conf.set(“spark.sql.broadcastTimeout”, newValueForExample36000)
persist() both DataFrames, then Spark will use Shuffle Join

So you can :

increase spark.sql.broadcastTimeout Value ;
Persist two DataFrames;
in addition , You can also consider BroadcastJoin Disable it , And add spark.driver.memory Value .

In addition to increasing spark.sql.broadcastTimeout or persist() both DataFrames,
You may try:
1.disable broadcast by setting spark.sql.autoBroadcastJoinThreshold to -1
2.increase the spark driver memory by setting spark.driver.memory to a higher value.
https://blog.csdn.net/weixin_44455388/article/details/101286428
copyright:author[Big data learning monk],Please bring the original link to reprint, thank you. https://en.javamana.com/2022/02/202202130715543587.html