Quick Start

Getting Azkaban

Download the latest release, and unzip (or untar) it. The following instructions are relative to the directory in which you unzip the file.

Standalone Deployment

Run Azkaban from the command line by issuing the following command:
  > mkdir /some-dir/azkaban-jobs
  > bin/azkaban-server.sh --job-dir /some_dir/azkaban-jobs
You can then navigate to http://localhost:8081 in your browser to interact with the web user interface.

Deployment in Tomcat

There is a prebuilt war file in the dist/ directory of the release. This file can be deployed using standard means in Tomcat or any other servlet container. In this case, the job directory must be set with the $AZKABAN_HOME environment variable.

Whether it is run through Tomcat or from the command line, the following screenshot displays the index page.

Jobs

Creating a simple job

A job is a process you want to run in Azkaban. It can be kicked off from the user interface, scheduled to run in the future, or used as a dependency for other jobs. Each time Azkaban executes your job it records whether it succeeded or failed, how long it ran, and any logging it produced for future reference.

An Azkaban job is a file ending in the suffix .job that appears in your job directory. For example a file foo.job declares a job named foo. The properties of the job are set in the job file using the form key=value. Here are the contents of an example job file:

  # This is a comment
  type=command
  command=echo "Hello World"

This job has two properties: type and command. The type property is required of all jobs and determines how the job is to be run—in this case as a simple Unix command. The command property is specific to Unix jobs and gives the Unix command line to be executed. The job could also have additional properties for its own use.

Creating a job can be done from the user interface, as well by clicking Create Job on the main page. The next sections describe how to create and bundle a complete job flow. The sections also provide additional details on the various types of jobs and the available standard properties.

Creating a job flow

A job flow is a set of jobs that depend on one another. The dependencies of a job always run before the job itself can run. To add dependencies to a job, add the dependencies property as shown in the following example.
$ cat > foo.job
type=command
command=echo foo
$ cat > bar.job
type=command
dependencies=foo
command=echo bar

Deploying a job flow

Directly editing the job files works when developing a job flow on your desktop, but when it is ready to be deployed in production you might want to wrap everything up and ship it around. To support this, Azkaban supports the deployment of .zip files containing a set of jobs, additional configuration, JAR files, and any other artifacts needed. The following example shows how to create such a zip file.

Add these files to a zip file, such as foobar.zip. Note that the path will automatically filled with the zip name. The path determines the path in Azkaban to install the zip. Installing the zip to a pre-existing path will overwrite the existing installed zip.

zip -u foobar.zip *.job

This should now display in the user interface as the following hierarchy:

'bar' is a dependency of 'foo' and both are under 'foobar' section, which refers to the installed path. Jobs which are not dependencies of other jobs will appear as roots of the job 'tree'.

Now executing the job bar first executes foo. If foo completes successfully bar runs; otherwise, it is considered failed.

Running your job

Because we have made these changes within the job directory, they are automatically "deployed" and ready to be run. This can be done by checking the correct job in the user interface, and selecting Run (to run it immediately) or Schedule (to run it in the future). Scheduled jobs can be set to repeat on some predetermined schedule.

If you do not want to use a graphical user interface, you can run your job flow from the command line (or from within an IDE for debugging). To run a job named my-job stored in the root job directory /some-dir/azkaban-jobs, issue the following command:

$ bin/run-job.sh --job-dir /some-dir/azkaban-jobs my-job

Viewing a Job

Selecting a job takes you to the Job Details page. From this page, job properties can be redefined or new jobs can be created. You can also view the history of job execution logs and job runtimes.

Flows

Azkaban can display your dependency tree. There are two ways to do this. Hover over a job to see the View Flow link, or clicking on View/Restart link on the history page.

Use the mouse to move nodes and pan the graph, and the mouse wheel and the zoom bar to zoom in and out. Right clicking on the nodes allows you to disable nodes. Disable nodes appear faded out, and act as no-op jobs: they will 'run' but do nothing.

Pressing on restart in the history view will color the nodes depending on its status. Red for failed, green for success, blue for running (or waiting to run), and grey for ready. Clicking on Execute will run the flow immediately, and will run all jobs that aren't disabled. The above image represents a failed flow with the failure status trickling down to 'commmand_ls'. 'java_sleep' is disabled so clicking on Execute will re-run the flow but java_sleep will do nothing.

Hierarchical Configuration

This simple method of storing job properties in .job files works, but it is important to be able to handle shared properties such as a database connection URL or the default notification email for job failure. To support this, Azkaban allows for properties to be separated into shared .properties files. Additionally, jobs inherit any properties in the job's parent directories, allowing for simple namespacing.

Consider the following example job directory layout:

  system.properties
  baz.job
  my_flow/
    my_flow.properties
    foo.job
    bar.job

There are three jobs declared here: baz, foo, and bar. Both foo and bar are run with the properties they declare, plus the properties defined in both properties files. However, baz only has access to the properties defined in its own job file and in system.properties.

Hierarchical configuration allows many people to build and deploy job flows to separate deployment paths totally independently, but to still share some common top-level configuration parameters. The hierarchy can be multiple levels, allowing a single workflow to have isolated sub-parts.

Additional Information

Additional information on job types, scheduling jobs, alerting, and so on can be found in the full documentation section.