Run Applications
Learn how to run the applications you have created in Data Flow, provide argument and parameter values, review the results, and diagnose and tune the runs, including providing JVM options.
Data Flow automatically stops
long running batch jobs (more than 24 hours) using a
delegation token. In this case, if the application
isn't finished with processing the data, you might
get run failure and the job remains unfinished. To
prevent this, use the following options to limit the
total time the application can run:
- When Creating Runs using the Console
- Under Advanced Options, specify the duration in Max run duration minutes.
- When Creating Runs using the CLI
- Pass command line option of
--max-duration-in-minutes <number>
- When Creating Runs using the SDK
- Provide optional argument
max_duration_in_minutes
- When Creating Runs using the API
- Set the optional argument
maxDurationInMinutes
Understand Runs
Every time a Data Flow Application is executed, a Data Flow Run is created. The Data Flow Run captures and securely stores the application's output, logs, and statistics. The output is saved so it can be viewed by anyone with the correct permissions using the UI or REST API. Runs also give you secure access to the Spark UI for debugging and diagnostics.