Spark SQL:自适应查询执行(Adaptive Query Execution)

合并shuffle后的分区

配置项 默认值 描述 版本
spark.sql.adaptive.enabled true 3.0.0
spark.sql.adaptive.coalescePartitions.enabled 3.0.0
spark.sql.adaptive.coalescePartitions.initialPartitionNum 当shuffle后分区数超过该值时,启动合并。默认等于spark.sql.shuffle.partitions。该设置是为了保证下游分区并行度水平不至于过低。 3.0.0
spark.sql.adaptive.advisoryPartitionSizeInBytes 64MB 合并后的目标分区的建议大小 3.0.0
spark.sql.adaptive.coalescePartitions.parallelismFirst true 合并分区时采用并行度优先的原则
1.该配置项为true时,会遵循spark.sql.adaptive.coalescePartitions.minPartitionSize设置的最小目标分区大小,以最大化并行度;而忽略spark.sql.adaptive.advisoryPartitionSizeInBytes所配置的目标分区建议大小。
2.推荐将该配置项设置为false,遵循spark.sql.adaptive.advisoryPartitionSizeInBytes所配置的目标分区建议大小。
3.2.0
spark.sql.adaptive.coalescePartitions.minPartitionSize 1MB 合并后的目标分区需要满足的最小值,该配置项最高设置为spark.sql.adaptive.advisoryPartitionSizeInBytes的20%。 3.2.0

将SortMergeJoin转换为BroadcastJoin/ShuffleHashJoin

配置项 默认值 描述 版本
spark.sql.adaptive.enabled true 3.0.0
spark.sql.adaptive.autoBroadcastJoinThreshold 将SortMergeJoin转换为BroadcastJoin时表大小的最大值。默认等于spark.sql.autoBroadcastJoinThreshold 3.2.0
spark.sql.adaptive.localShuffleReader.enabled true 3.2.0
spark.sql.adaptive.maxShuffledHashJoinLocalMapThreshold 0 当所有分区都小于或等于该阈值时,不管spark.sql.join.preferSortMergeJoin是如何配置的,Spark都会优先选择ShuffleHashJoin,而不是SortMergeJoin。 3.2.0

优化倾斜Join

配置项 默认值 描述 版本
spark.sql.adaptive.enabled true 3.0.0
spark.sql.adaptive.skewJoin.enabled true 3.0.0
spark.sql.adaptive.skewJoin.skewedPartitionFactor 5 当分区大小同时满足如下两个条件则被认为是倾斜的:
1. 大于 skewedPartitionFactor * 分区大小中位数;
2. 大于 skewedPartitionThresholdInBytes;
3.0.0
spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes 256MB 3.0.0

参考

  1. https://databricks.com/blog/2020/05/29/adaptive-query-execution-speeding-up-spark-sql-at-runtime.html
  2. https://spark.apache.org/docs/latest/sql-performance-tuning.html#adaptive-query-execution
  3. https://medium.com/@ravishankar.nair/adaptive-query-execution-aqe-in-spark-3-with-example-what-every-spark-programmer-must-know-adfde0dc600e
  4. https://sparkbyexamples.com/spark/spark-adaptive-query-execution/
  5. https://mp.weixin.qq.com/s/30JqnQf3NlJJUg-6T5Nzxg
  6. https://mp.weixin.qq.com/s/eZFWRbJv4Uzo_CbZcqfipw