Druid Quick Start

本篇是一篇Druid的Quick Start,本篇不同于Druid官网的Quick Start,思路借鉴了Druid官网的注册规范一文,本篇更注重从任务、数据两者因果关系上进行介绍。

准备工作

启动druid

1
$ ./bin/start-micro-quickstart 

[Fri Nov 20 21:41:44 2020] Running command[zk], logging to[/Users/dream/Env/druid/apache-druid-0.20.0/var/sv/zk.log]: bin/run-zk conf
[Fri Nov 20 21:41:44 2020] Running command[coordinator-overlord], logging to[/Users/dream/Env/druid/apache-druid-0.20.0/var/sv/coordinator-overlord.log]: bin/run-druid coordinator-overlord conf/druid/single-server/micro-quickstart
[Fri Nov 20 21:41:44 2020] Running command[broker], logging to[/Users/dream/Env/druid/apache-druid-0.20.0/var/sv/broker.log]: bin/run-druid broker conf/druid/single-server/micro-quickstart
[Fri Nov 20 21:41:44 2020] Running command[router], logging to[/Users/dream/Env/druid/apache-druid-0.20.0/var/sv/router.log]: bin/run-druid router conf/druid/single-server/micro-quickstart
[Fri Nov 20 21:41:44 2020] Running command[historical], logging to[/Users/dream/Env/druid/apache-druid-0.20.0/var/sv/historical.log]: bin/run-druid historical conf/druid/single-server/micro-quickstart
[Fri Nov 20 21:41:44 2020] Running command[middleManager], logging to[/Users/dream/Env/druid/apache-druid-0.20.0/var/sv/middleManager.log]: bin/run-druid middleManager conf/druid/single-server/micro-quickstart

各个服务会将日写入var/sv目录:

var/sv/
├── broker.log
├── coordinator-overlord.log
├── historical.log
├── middleManager.log
├── router.log
└── zk.log

准备数据

将以下内容写入文件:quickstart/me-tutorial-data.json

1
2
3
4
5
6
7
8
9
10
11
{"ts":"2018-01-01T01:01:35Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2", "srcPort":2000, "dstPort":3000, "protocol": 6, "packets":10, "bytes":1000, "cost": 1.4}
{"ts":"2018-01-01T01:01:51Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2", "srcPort":2000, "dstPort":3000, "protocol": 6, "packets":20, "bytes":2000, "cost": 3.1}
{"ts":"2018-01-01T01:01:59Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2", "srcPort":2000, "dstPort":3000, "protocol": 6, "packets":30, "bytes":3000, "cost": 0.4}
{"ts":"2018-01-01T01:02:14Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2", "srcPort":5000, "dstPort":7000, "protocol": 6, "packets":40, "bytes":4000, "cost": 7.9}
{"ts":"2018-01-01T01:02:29Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2", "srcPort":5000, "dstPort":7000, "protocol": 6, "packets":50, "bytes":5000, "cost": 10.2}
{"ts":"2018-01-01T01:03:29Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2", "srcPort":5000, "dstPort":7000, "protocol": 6, "packets":60, "bytes":6000, "cost": 4.3}
{"ts":"2018-01-01T02:33:14Z","srcIP":"7.7.7.7", "dstIP":"8.8.8.8", "srcPort":4000, "dstPort":5000, "protocol": 17, "packets":100, "bytes":10000, "cost": 22.4}
{"ts":"2018-01-01T02:33:45Z","srcIP":"7.7.7.7", "dstIP":"8.8.8.8", "srcPort":4000, "dstPort":5000, "protocol": 17, "packets":200, "bytes":20000, "cost": 34.5}
{"ts":"2018-01-01T02:35:45Z","srcIP":"7.7.7.7", "dstIP":"8.8.8.8", "srcPort":4000, "dstPort":5000, "protocol": 17, "packets":300, "bytes":30000, "cost": 46.3}
{"ts":"2018-01-02T03:03:01Z","srcIP":"4.4.4.4", "dstIP":"5.5.5.5", "srcPort":4000, "dstPort":5000, "protocol": 17, "packets":30, "bytes":30000, "cost": 1.3}
{"ts":"2018-01-02T03:03:02Z","srcIP":"4.4.4.4", "dstIP":"5.5.5.5", "srcPort":4000, "dstPort":5000, "protocol": 17, "packets":30, "bytes":30000, "cost": 1.3}

准备索引描述

将以下内容写入文件:quickstart/me-tutorial-index.json

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
{
"type" : "index_parallel",
"spec" : {
"dataSchema" : {
"dataSource" : "me-tutorial",
"timestampSpec" : {
"format" : "iso",
"column" : "ts"
},
"dimensionsSpec" : {
"dimensions": [
"srcIP",
{ "name" : "srcPort", "type" : "long" },
{ "name" : "dstIP", "type" : "string" },
{ "name" : "dstPort", "type" : "long" },
{ "name" : "protocol", "type" : "string" }
]
},
"metricsSpec" : [
{ "type" : "count", "name" : "count" },
{ "type" : "longSum", "name" : "packets", "fieldName" : "packets" },
{ "type" : "longSum", "name" : "bytes", "fieldName" : "bytes" },
{ "type" : "doubleSum", "name" : "cost", "fieldName" : "cost" }
],
"granularitySpec" : {
"type" : "uniform",
"segmentGranularity" : "HOUR",
"queryGranularity" : "MINUTE",
"intervals" : ["2018-01-01/2018-01-02"],
"rollup" : true
}
},
"ioConfig" : {
"type" : "index_parallel",
"inputSource" : {
"type" : "local",
"baseDir" : "quickstart/",
"filter" : "me-tutorial-data.json"
},
"inputFormat" : {
"type" : "json"
}
},
"tuningConfig" : {
"type" : "index_parallel",
"maxRowsPerSegment" : 5
}
}
}

提交任务

1
$ bin/post-index-task --file quickstart/me-tutorial-index.json --url http://localhost:8081

Beginning indexing data for me-tutorial
Task started: index_parallel_me-tutorial_kkcdjied_2020-11-20T13:44:12.560Z
Task log: http://localhost:8081/druid/indexer/v1/task/index_parallel_me-tutorial_kkcdjied_2020-11-20T13:44:12.560Z/log
Task status: http://localhost:8081/druid/indexer/v1/task/index_parallel_me-tutorial_kkcdjied_2020-11-20T13:44:12.560Z/status
Task index_parallel_me-tutorial_kkcdjied_2020-11-20T13:44:12.560Z still running…
Task index_parallel_me-tutorial_kkcdjied_2020-11-20T13:44:12.560Z still running…
Task index_parallel_me-tutorial_kkcdjied_2020-11-20T13:44:12.560Z still running…
Task index_parallel_me-tutorial_kkcdjied_2020-11-20T13:44:12.560Z still running…
Task index_parallel_me-tutorial_kkcdjied_2020-11-20T13:44:12.560Z still running…
Task finished with status: SUCCESS
Completed indexing data for me-tutorial. Now loading indexed data onto the cluster…
me-tutorial loading complete! You may now query your data

探索

我们看Segments页面,列出了3个分片,如下:

我们在粒度说明中,开启了rollup,且queryGranularity为分钟,如下:

1
2
3
4
5
6
7
"granularitySpec" : {
"type" : "uniform",
"segmentGranularity" : "HOUR",
"queryGranularity" : "MINUTE",
"intervals" : ["2018-01-01/2018-01-02"],
"rollup" : true
}

这样我们原始的摄入数据(未标注黄色背景的行)就会被索引任务按分钟粒度,执行“上卷”操作,“上卷”结果如下图中黄色背景标注的行。并且,我们定义了"intervals" : ["2018-01-01/2018-01-02"],所以03小时的两条记录将直接被任务所忽略掉:

粒度说明中,segmentGranularity为小时,即数据按小时进行分片,最终,2018-01-01 01小时“上卷”后的记录有6条,02小时有2条;因为我们定义了maxRowsPerSegment为5,所以01小时的数据被划分到了两个数据分片。数据最终的逻辑划分如下图所示:

我们看一下磁盘中数据文件的情况:

参考

  1. https://druid.apache.org/docs/latest/tutorials/tutorial-ingestion-spec.html