Documentation

This documentation assumes you have browsed through the Quick Start guide to learn the basics.

Jobs and Configuration

Azkaban jobs are basically code plus configuration values. Configuration is stored as a properties file in the format key=value. These job files can be created manually in a text editor or through the web interface. Many of the configuration parameters will be custom configurations for your job, but there are a number of standard parameters that activate common job functionality. These parameters are described in the following sections.

Job Types

All jobs require a type property specifying how to execute them. Currently, there are four job types: java, command, javaprocess, and pig.
Property Required? Meaning
type required The job type: java, command, javaprocess, or pig

Each of these types has a variety of options as described in the following sections.

command jobs

Command jobs are essentially Unix commands executed as separate processes. Any output sent to standard out or standard error is redirected to the Azkaban log for the job. The job is considered to have succeeded if it completes with an exit code of zero. A non-zero exit code is treated as a failure.

The following properties are available in command jobs:

Property Required? Meaning Example
command required Specifies the command to execute. ls -lh
command.n optional Defines additional commands that are run sequentially after command. ls -lh
working.dir optional Specifies the directory in which the command is invoked. The default working directory is the job's directory. /home/ejk
env.property optional Specifies environment variables that should be set before running the command. property defines the name of the environment variable, so env.VAR_NAME=VALUE creates an environment variable $VAR_NAME and gives it the value of VALUE.

javaprocess jobs

Java process jobs are a convenient wrapper for kicking off Java-based programs. It is equivalent to running a class with a main method from the command line. The following properties are available in javaprocess jobs:

Property Required? Meaning Example
java.class required The class that contains the main function. azkaban.example.test.HelloWorld
classpath optional A comma-delimited list of JAR files and directories to be added to the classpath. If not set, it adds all JARs in the working directory to the classpath. commons-io.jar,helloworld.jar
Xms optional The initial memory pool size to start the JVM. The default is 64M. 64M
Xmx optional The maximum memory pool size. The default is 256M. 256M
main.args optional List of comma-delimited arguments to pass to the Java main function. arg1,arg2
jvm.args optional Arguments set for the JVM. This is not a list. The entire string is passed intact as a VM argument. -Dmyprop=test -Dhello=world
working.dir optional Inherited from command jobs. /home/ejk
env.property optional Inherited from command jobs. env.MY_ENV_VARIABLE=testVariable

pig jobs

This job type runs pig scripts through grunt. The following properties are available in pig jobs:

Property Required? Meaning Example
pig.script optional Specifies the pig script to run. If not set, it uses the job name to find jobname.pig. pig-example.pig
udf.import.list optional Comma-delimited list of UDF imports oink.,linkedin.udf.
param.name optional Used for parameter replacement to pass parameters from your job into your pig script. Order is not guaranteed. See the pig documentation for information on using pig parameters in your scripts. param.variable1=myvalue
paramfile optional Comma-delimited list of files used for variable replacement in your pig script. Order is not guaranteed, and param.name takes precedence. paramfile1,paramfile2
hadoop.job.ugi optional Sets the user name and group for Hadoop jobs. hadoop,group
classpath optional Inherited from javaprocess jobs. commons-io.jar,helloworld.jar
Xms optional Inherited from javaprocess jobs. 64M
Xmx optional Inherited from javaprocess jobs. 256M
jvm.args optional Inherited from javaprocess jobs. -Dmyprop=test -Dhello=world
working.dir optional Inherited from command jobs. /home/ejk
env.property optional Inherited from command jobs.

java jobs

Java jobs are any Java classes that have a run() method, such as a java.lang.Runnable. To avoid tying your code to framework-specific interfaces, the Java class does not need to implement any interface; however, Azkaban can make use of all the methods given in the following class (some of which are optional):

Logging should be to a log4j logger with the logger name set to the job name. Azkaban provides a log4j appender that sends these messages to the appropriate job log.

Property Required? Meaning Default
job.class required The Java class to run  
method.run optional The name of the no-arg method to use for running the job run
method.cancel optional The name of the no-arg method to cancel the job cancel
method.progress optional The name of the no-arg method to use for getting progress from the job getProgress

Job Locking

There are three types of locks in Azkaban: permit, read.lock, and write.lock.

Permits

Permit locks are locks used to throttle concurrent access to a resource. For example if you want to guarantee that no more than 4 jobs ever read from a particular database at once, you could set up a pool of 4 permits and have each job require one permit to run. The number of permits are set using the total.job.permits parameter in a job directory's .property file.

The number of permits the job must acquire to run is provided in the job parameters by job.permits. All permits are immediately released when the job finishes or fails.

Read and Write Locks

Azkaban support named Read/Write locks for resources. A common use case is locking access to a file in HDFS for modification—for example, when you have many jobs that read a file and one that recreates it, you want to ensure you do not recreate the file while others are reading it. Readers do not block other readers and any number of readers are allowed; however, only a single writer is allowed, and to begin writing all readers currently executing must complete.

These locks can be set through the read.lock and write.lock parameters as defined in the following table.

Property Meaning Example
job.permits Used to throttle the number of jobs using a particular resource. See the previous locking section. 3
read.lock Comma-separated list of resource locks. Used to obtain a read lock on the named resource. See the previous locking section. /some/resource/name1,/some/resource/name2
write.lock Comma-separated list of resource locks. Used to obtain a write lock on the named resource. See the previous locking section. /some/resource/name1,/some/resource/name2

Job Directory Layout

Jobs files are property files that end in .job. Additional properties can be given in .properties files. A property can refer to other properties such as in the following example:
db.url=${db.host}:${db.port}

A common need is to support deploying a single job in many environments (for example, Dev, QA, and Production) and each of these environments has some difference that requires special configuration. To allow this, Azkaban makes all the configuration for a job hierarchical. A job inherits any properties defined in the local directory to which it is deployed, or if the property is not found there, then in the parent directories. To avoid adding any environment-specific properties to the job (such as a particular host name or port), use a variable such as ${some.url}, which is defined in a global properties file. This global properties file can be set in each environment the job needs to run in, and not redeployed with the job.

Other Standard Job Properties

A number of properties are made available to jobs of all types by the framework. These can be set by adding the given property to any job. The following table lists the available properties and their meanings.

Property Meaning Example
dependencies A comma-separated list of job names, one for each job depended on. Dependencies are always run first, and a job is only started if all its dependencies complete successfully. foo, bar
notify.emails A comma-separated list of email addresses to notify upon success and failure of the job gwb@whitehouse.gov, barryo@whitehouse.gov
retries If your job fails, this property instructs Azkaban to run the job again up to the number of retries given. This is useful if you have a job that is unreliable due to circumstances outside your control, and simply trying again is likely to help. 3
retry.backoff The time to wait in between attempts when retries is set to a positive number (see retries property). The job waits for this many milliseconds between attempts. 30000

Azkaban System Properties

The following table lists the system-wide properties that can be set for Azkaban itself.
Property Meaning Example
mail.host The hostname of the mail server to which email notifications are sent. localhost
mail.user The user name on the mail server. joebob
mail.password The password of the mail server. password
scheduler.threads The maximum number of threads that can be used for running jobs. 50
total.job.permits A number of permits available in the system for assignment to jobs that set the job.permits property. 50