Apart from the environment, rest of the steps in DataSet API program are identical to that of the DataStream API. The following are the steps you have to perform to actually deal with batch data using the DataSet API in Flink:
- Before doing anything with the DataSet API, you need to get an environment specific to batch data handling. ExecutionEnvironment is the object require to start using the API and its capability as shown in this code snippet (there are many ways and this is just a basic way):
final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
- To do operations on data, the data has to be created or loaded from the source. In this set data is prepared. Data can be loaded/created ...