Flink:及时流处理(译)

Timely steam processing is an extension of stateful stream processing in which time plays some role in the computation. Among other things, this is the case when you do time series analysis, when doing aggregations based on certain time periods (typically called windows), or when you do event processing where the time when an event occurred is important.

及时流处理是状态流处理的延伸,在其内部,时间在计算当中起到了一定作用。另外,列举几个用例:在你进行时间序列分析时、当你根据某一时间周期(一般称为窗口)进行聚合时、或者在你进行的事件处理中,事件发生的时间比较重要的时候。

In the following sections we will highlight some of the topics that you should consider when working with timely Flink Applications.

接下来的章节中,我们将着重于一些当你在使用及时Flink应用程序时需要考虑的话题。

时间的概念:事件时间和处理时间

When referring to time in a streaming program (for example to define windows), one can refer to different notions of time:

在streaming程序中提到时间(例如,要定义窗口),人们指的可能是时间的不同概念:

  • Processing time: Processing time refers to the system time of the machine that is executing the respective operation.

    处理时间:处理时间指的是执行各个运算的机器的系统时间。(译注,处理时间更准确的理解为处理时的时间

    When a streaming program runs on processing time, all time-based operations (like time windows) will use the system clock of the machines that run the respective operator. An hourly processing time window will include all records that arrived at a specific operator between the times when the system clock indicated the full hour. For example, if an application begins running at 9:15am, the first hourly processing time window will include events processed between 9:15am and 10:00am, the next window will include events processed between 10:00am and 11:00am, and so on.

    当streaming程序按处理时间运行的时候,所有基于时间的运算(例如,时间窗口)都将使用运行各个算子的机器的系统时间。小时级别的处理时间窗口将包含在系统时钟整小时之间到达算子的所有记录。例如,如果一个应用在上午9:15开始运行,那么第一个小时级别的处理时间窗口会包含9:15am和10:00am之间被处理的事件,下一个窗口会包含10:00am和11:00am之间被处理的事件,依此类推。

    Processing time is the simplest notion of time and requires no coordination between streams and machines. It provides the best performance and the lowest latency. However, in distributed and asynchronous environments processing time does not provide determinism, because it is susceptible to the speed at which records arrive in the system (for example from the message queue), to the speed at which the records flow between operators inside the system, and to outages (scheduled, or otherwise).

    处理时间是最简单的时间概念,不需要在流和机器之间进行协调。它提供了最佳的性能和最低的延迟。然而,在分布式和异步的环境中,处理时间不能提供确定性,因为它容易受到记录到达系统的速度(例如,记录来自消息队列)、记录在系统内算子之间流转的速度、以及断电(计划或计划外的断电)的影响。

  • Event time: Event time is the time that each individual event occurred on its producing device. This time is typically embedded within the records before they enter Flink, and that event timestamp can be extracted from each record. In event time, the progress of time depends on the data, not on any wall clocks. Event time programs must specify how to generate Event Time Watermarks, which is the mechanism that signals progress in event time. This watermarking mechanism is described in a later section, below.

    事件时间:事件时间是每个事件在产生它的设备上出现的时间(译注,事件时间更准确的理解为事件的时间)。这个时间一般在进入到Flink前就嵌入在记录中了,并且这个事件时间戳可以从每个记录中提取出来。在事件时间中,时间的进展取决于数据,而不取决于任何挂钟时间。事件时间的程序中必须指定如何生成事件时间的水印,它是指示事件时间进展的一种机制。此处的水印机制在后面的章节中有描述,如下。

    In a perfect world, event time processing would yield completely consistent and deterministic results, regardless of when events arrive, or their ordering. However, unless the events are known to arrive in-order (by timestamp), event time processing incurs some latency while waiting for out-of-order events. As it is only possible to wait for a finite period of time, this places a limit on how deterministic event time applications can be.

    在一个完美的世界中,事件时间的处理中总是会产生完全一致的、并且确定的结果,不管事件到达的时间以及它们的次序。然而,除非知道事件是按顺序(按照时间戳)到达的,通常都会由于等待故障事件(译注,out-of-order直译为乱序,也即出现了故障)从而使事件时间的处理出现延迟。因为只可能等待有限的一段时间,这就限制了事件时间的应用程序可以达到怎样的确定性。

    Assuming all of the data has arrived, event time operations will behave as expected, and produce correct and consistent results even when working with out-of-order or late events, or when reprocessing historic data. For example, an hourly event time window will contain all records that carry an event timestamp that falls into that hour, regardless of the order in which they arrive, or when they are processed. (See the section on late events for more information.)

    假设所有数据都已经到了,即使是有故障的或迟到的事件、或当重新处理历史数据的时候,事件时间的运算都会如预期表现,产生正确、且一致的结果。例如,按小时的事件时间窗口将包含所携带的事件时间戳是落在该小时内的所有记录,不管它们到达的顺序如何、以及不管它们是在何时被处理的。(更多信息请查看关于迟到事件的章节)

    Note that sometimes when event time programs are processing live data in real-time, they will use some processing time operations in order to guarantee that they are progressing in a timely fashion.

    注意,有的时候,当事件时间的程序处理实时数据时,为了确保是按及时方式进行的,程序会使用处理时间的运算。

事件时间和水印

Note: Flink implements many techniques from the Dataflow Model. For a good introduction to event time and watermarks, have a look at the articles below.

注意,Flink实现了很多来自数据流模型的技术。一个好的有关事件时间和水印的介绍,可以查看下面的文章:

A stream processor that supports event time needs a way to measure the progress of event time. For example, a window operator that builds hourly windows needs to be notified when event time has passed beyond the end of an hour, so that the operator can close the window in progress.

支持事件时间的流处理器需要一种方式来衡量事件时间的进展。例如,一个创建小时级别窗口的window算子需要在每小时结束的时候被告知,以便该算子可以关闭还在进行中的窗口。

Event time can progress independently of processing time (measured by wall clocks). For example, in one program the current event time of an operator may trail slightly behind the processing time (accounting for a delay in receiving the events), while both proceed at the same speed. On the other hand, another streaming program might progress through weeks of event time with only a few seconds of processing, by fast-forwarding through some historic data already buffered in a Kafka topic (or another message queue).

事件时间可以独立于处理时间(由挂钟衡量的)之外进行。例如,在一个程序中,某算子的当前事件时间可能会略微落后于该算子的处理时间(说明接收事件存在延迟)例如,在一个程序中,算子中的当前事件时间可能会略微落后于处理时间(说明接收时间存在延迟),while both proceed at the same speed。另一方面,通过已经缓存在Kafka主题(或其它消息队列)中的历史数据另一个streaming程序可以用几秒的处理时间来快速处理几周的事件时间。

The mechanism in Flink to measure progress in event time is watermarks. Watermarks flow as part of the data stream and carry a timestamp t. A Watermark(t) declares that event time has reached time t in that stream, meaning that there should be no more elements from the stream with a timestamp t’ <= t (i.e. events with timestamps older or equal to the watermark).

Flink中用来衡量事件时间进展的机制是水印。水印作为数据流的一部分进行流转,并且携带了时间戳t。Watermark(t)声明了:在该数据流中事件时间已经到了时间t,这意味着数据流中不应该再有时间戳 t’ <= t的元素了(即,时间戳早于或等于该水印的事件)。

The figure below shows a stream of events with (logical) timestamps, and watermarks flowing inline. In this example the events are in order (with respect to their timestamps), meaning that the watermarks are simply periodic markers in the stream.

下图展示了一个流,流中的事件带着(逻辑)时间戳、水印与事件内连在一起流动。在这个例子中,事件是按顺序的(相对于它们的时间戳),这意味着水印只是这个数据流中的简单周期性标记。

对于乱了序的流而言,水印是至关重要的,如下图所示,其中事件不是按它们的时间戳排序的。通常,水印是一种声明:此刻,在该流中,截止到某个时间戳的所有事件都应该到达了。一旦水印抵达算子,该算子就可以将其内部的事件时间的时钟推进到水印的值。

Note that event time is inherited by a freshly created stream element (or elements) from either the event that produced them or from watermark that triggered creation of those elements.

注意,被新创建出来的流元素从创建它们的元素处或从触发创建这些元素的水印处继承事件时间。

并行流中的水印

Watermarks are generated at, or directly after, source functions. Each parallel subtask of a source function usually generates its watermarks independently. These watermarks define the event time at that particular parallel source.

水印产生于source函数或紧跟在source函数之后。source函数的每个并行子任务一般都单独生成自己的水印。

As the watermarks flow through the streaming program, they advance the event time at the operators where they arrive. Whenever an operator advances its event time, it generates a new watermark downstream for its successor operators.

当水印流经streaming程序时,它们会推进其所到达的算子的事件时间。每当一个算子推进了其事件时间,它会为其下游算子生成一个新的水印。

Some operators consume multiple input streams; a union, for example, or operators following a keyBy(…) or partition(…) function. Such an operator’s current event time is the minimum of its input streams’ event times. As its input streams update their event times, so does the operator.

有些算子接收多个输入流,例如union、或keyBy(…)和partition(…)函数之后的算子。这些算子当前的事件时间是其输入流中事件时间中最小的那一个。当其输入流更新了事件时间,算子也相应地更新。

The figure below shows an example of events and watermarks flowing through parallel streams, and operators tracking event time.

下图展示的例子中,事件和水印在并行的流之间流动,算子记录了事件时间。

迟到

It is possible that certain elements will violate the watermark condition, meaning that even after the Watermark(t) has occurred, more elements with timestamp t’ <= t will occur. In fact, in many real world setups, certain elements can be arbitrarily delayed, making it impossible to specify a time by which all elements of a certain event timestamp will have occurred. Furthermore, even if the lateness can be bounded, delaying the watermarks by too much is often not desirable, because it causes too much delay in the evaluation of event time windows.

可能会有某些元素不满足水印的条件,这意味着在Watermark(t)*出现后,还会有时间戳 *t’ <= t的元素到达。实际上,很多现实世界当中,某些元素很随意地会被延迟,这就导致不可能确定这样一个时间,使得某个事件时间戳的所有元素在该时间以前都到达了。而且,即使迟到被允许在一定程度,过长时间地拖延水印往往也不是所期望的,因为这会导致过长时间地延迟事件时间窗口的计算。

For this reason, streaming programs may explicitly expect some late elements. Late elements are elements that arrive after the system’s event time clock (as signaled by the watermarks) has already passed the time of the late element’s timestamp. See Allowed Lateness for more information on how to work with late elements in event time windows.

基于这个原因,streaming程序可能会明确地期望某种late元素。late元素是这样的元素,它们在系统的事件时间时钟(由水印发出信号)超出该late元素的时间戳的时候到达。更多关于在事件时间窗口中如何与late元素协同工作,参看被允许的迟到

窗口

Aggregating events (e.g., counts, sums) works differently on streams than in batch processing. For example, it is impossible to count all elements in a stream, because streams are in general infinite (unbounded). Instead, aggregates on streams (counts, sums, etc), are scoped by windows, such as “count over the last 5 minutes”, or “sum of the last 100 elements”.

在流上对事件聚合(例如,count、sum)与在批处理中是不同的。例如,不可能计数流上的全部元素,因为流一般来说是无限的(unbounded)。故而,流上的聚合(count、sum等)是在窗口范围内的,例如“对过去5分钟计数”或“最近100个元素求和”。

Windows can be time driven (example: every 30 seconds) or data driven (example: every 100 elements). One typically distinguishes different types of windows, such as tumbling windows (no overlap), sliding windows (with overlap), and session windows (punctuated by a gap of inactivity).

窗口可以是时间驱动的(例如:每30秒)或者数据驱动的(例如:每100个元素)。人们一般会将不同类型的窗口进行区分,例如,滚动窗口(没有重叠)、滑动窗口(有重叠)、以及会话窗口(由于一段时间地不活动而中断)

Please check out this blog post for additional examples of windows or take a look a window documentation of the DataStream API.

参考

  1. https://ci.apache.org/projects/flink/flink-docs-release-1.12/zh/concepts/timely-stream-processing.html