Spark SQL:GROUP语句
聚合
TODO
高级聚合
GROUPING SETS
1 | GROUP BY GROUPING SETS(分组集1[, 分组集2, ...]) |
对关键字 GROUPING SETS
后面指定的多个分组集合中每个集合都进行一次分组操作,等价于按照每个分组集合单独进行聚合,然后再将这些聚合结果进行 UNION ALL
。
注意:为了兼容Hive,Spark也支持GROUP BY ... GROUPING SETS (...)
这样的写法。GROUP BY
表达式通常被忽略,但是如果GROUP BY
表达式中包含有GROUPING SETS
表达式以外的表达式,那么这些额外的表达式将被包含进分组表达式中且其值为NULL,例如SELECT a, b, c FROM ... GROUP BY a, b, c GROUPING SETS (a, b)
GROUPING SETS | 等价表示 |
---|---|
GROUP BY GROUPING SETS((a), (b)) GROUP BY GROUPING SETS(a, b) |
GROUP BY a UNION ALL GROUP BY b |
GROUP BY a,b GROUPING SETS(a, b) |
SELECT a, NULL ... GROUP BY a UNION ALL SELECT NULL, b ... GROUP BY b |
ROLLUP
1 | GROUP BY 分组表达式1[, 分组表达式2, ...] WITH ROLLUP |
RULLUP | 等价表示 |
---|---|
GROUP BY a, b WITH ROLLUP GROUP BY ROLLUP(a, b) |
GROUP BY GROUPING SETS((a, b), (a), ()) |
CUBE
1 | GROUP BY 分组表达式1[, 分组表达式2, ...] WITH CUBE |
CUBE | 等价表示 |
---|---|
GROUP BY a, b WITH CUBE GROUP BY CUBE(a,b) |
GROUP BY GROUPING SETS((a,b), (a), (b), ()) |
GROUP BY CUBE(a, b, (a,c)) | GROUP BY GROUPING SETS((a,b,c), (a,b), (a,c), (b,a,c), (a), (b), (a,c), ()) |
参考: