0%
Spark SQL:自适应查询执行(Adaptive Query Execution)
合并shuffle后的分区
配置项 | 默认值 | 描述 | 版本 |
---|---|---|---|
spark.sql.adaptive.enabled | true | 3.0.0 | |
spark.sql.adaptive.coalescePartitions.enabled | 3.0.0 | ||
spark.sql.adaptive.coalescePartitions.initialPartitionNum | 当shuffle后分区数超过该值时,启动合并。默认等于spark.sql.shuffle.partitions。该设置是为了保证下游分区并行度水平不至于过低。 | 3.0.0 | |
spark.sql.adaptive.advisoryPartitionSizeInBytes | 64MB | 合并后的目标分区的建议大小 | 3.0.0 |
spark.sql.adaptive.coalescePartitions.parallelismFirst | true |
合并分区时采用并行度优先的原则 1.该配置项为true时,会遵循spark.sql.adaptive.coalescePartitions.minPartitionSize设置的最小目标分区大小,以最大化并行度;而忽略spark.sql.adaptive.advisoryPartitionSizeInBytes所配置的目标分区建议大小。 2.推荐将该配置项设置为false,遵循spark.sql.adaptive.advisoryPartitionSizeInBytes所配置的目标分区建议大小。 |
3.2.0 |
spark.sql.adaptive.coalescePartitions.minPartitionSize | 1MB | 合并后的目标分区需要满足的最小值,该配置项最高设置为spark.sql.adaptive.advisoryPartitionSizeInBytes的20%。 | 3.2.0 |
将SortMergeJoin转换为BroadcastJoin/ShuffleHashJoin
配置项 | 默认值 | 描述 | 版本 |
---|---|---|---|
spark.sql.adaptive.enabled | true | 3.0.0 | |
spark.sql.adaptive.autoBroadcastJoinThreshold | 将SortMergeJoin转换为BroadcastJoin时表大小的最大值。默认等于spark.sql.autoBroadcastJoinThreshold | 3.2.0 | |
spark.sql.adaptive.localShuffleReader.enabled | true | 3.2.0 | |
spark.sql.adaptive.maxShuffledHashJoinLocalMapThreshold | 0 | 当所有分区都小于或等于该阈值时,不管spark.sql.join.preferSortMergeJoin是如何配置的,Spark都会优先选择ShuffleHashJoin,而不是SortMergeJoin。 | 3.2.0 |
优化倾斜Join
配置项 | 默认值 | 描述 | 版本 |
---|---|---|---|
spark.sql.adaptive.enabled | true | 3.0.0 | |
spark.sql.adaptive.skewJoin.enabled | true | 3.0.0 | |
spark.sql.adaptive.skewJoin.skewedPartitionFactor | 5 |
当分区大小同时满足如下两个条件则被认为是倾斜的: 1. 大于 skewedPartitionFactor * 分区大小中位数; 2. 大于 skewedPartitionThresholdInBytes; |
3.0.0 |
spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes | 256MB | 3.0.0 |
参考
- https://databricks.com/blog/2020/05/29/adaptive-query-execution-speeding-up-spark-sql-at-runtime.html
- https://spark.apache.org/docs/latest/sql-performance-tuning.html#adaptive-query-execution
- https://medium.com/@ravishankar.nair/adaptive-query-execution-aqe-in-spark-3-with-example-what-every-spark-programmer-must-know-adfde0dc600e
- https://sparkbyexamples.com/spark/spark-adaptive-query-execution/
- https://mp.weixin.qq.com/s/30JqnQf3NlJJUg-6T5Nzxg
- https://mp.weixin.qq.com/s/eZFWRbJv4Uzo_CbZcqfipw
Flink入门系列2:作业提交
Posted on
In
flink