Documentation
This documentation assumes you have browsed through the Quick Start guide to learn the basics.Jobs and Configuration
Azkaban jobs are basically code plus configuration values. Configuration is stored as a properties file in the format key=value. These job files can be created manually in a text editor or through the web interface. Many of the configuration parameters will be custom configurations for your job, but there are a number of standard parameters that activate common job functionality. These parameters are described in the following sections.
Job Types
All jobs require atype property specifying how to execute them. Currently, there are four job types: java, command, javaprocess, and pig.
| Property | Required? | Meaning |
|---|---|---|
type |
required | The job type: java, command, javaprocess, or pig |
Each of these types has a variety of options as described in the following sections.
command jobs
Command jobs are essentially Unix commands executed as separate processes. Any output sent to standard out or standard error is redirected to the Azkaban log for the job. The job is considered to have succeeded if it completes with an exit code of zero. A non-zero exit code is treated as a failure.
The following properties are available in command jobs:
| Property | Required? | Meaning | Example |
|---|---|---|---|
command |
required | Specifies the command to execute. | ls -lh |
command.n |
optional | Defines additional commands that are run sequentially after command. |
ls -lh |
working.dir |
optional | Specifies the directory in which the command is invoked. The default working directory is the job's directory. | /home/ejk |
env.property |
optional | Specifies environment variables that should be set before running the command. property defines the name of the environment variable, so env.VAR_NAME=VALUE creates an environment variable $VAR_NAME and gives it the value of VALUE. | |
| Property | Required? | Meaning | Example |
|---|---|---|---|
java.class |
required | The class that contains the main function. | azkaban.example.test.HelloWorld |
classpath |
optional | A comma-delimited list of JAR files and directories to be added to the classpath. If not set, it adds all JARs in the working directory to the classpath. | commons-io.jar,helloworld.jar |
Xms |
optional | The initial memory pool size to start the JVM. The default is 64M. | 64M |
Xmx |
optional | The maximum memory pool size. The default is 256M. | 256M |
main.args |
optional | List of comma-delimited arguments to pass to the Java main function. | arg1,arg2 |
jvm.args |
optional | Arguments set for the JVM. This is not a list. The entire string is passed intact as a VM argument. | -Dmyprop=test -Dhello=world |
working.dir |
optional | Inherited from command jobs. |
/home/ejk |
env.property |
optional | Inherited from command jobs. |
env.MY_ENV_VARIABLE=testVariable |
pig jobs
This job type runs pig scripts through grunt. The following properties are available in pig jobs:
| Property | Required? | Meaning | Example |
|---|---|---|---|
pig.script |
optional | Specifies the pig script to run. If not set, it uses the job name to find jobname.pig. |
pig-example.pig |
udf.import.list |
optional | Comma-delimited list of UDF imports | oink.,linkedin.udf. |
param.name |
optional | Used for parameter replacement to pass parameters from your job into your pig script. Order is not guaranteed. See the pig documentation for information on using pig parameters in your scripts. | param.variable1=myvalue |
paramfile |
optional | Comma-delimited list of files used for variable replacement in your pig script. Order is not guaranteed, and param.name takes precedence. | paramfile1,paramfile2 |
hadoop.job.ugi |
optional | Sets the user name and group for Hadoop jobs. | hadoop,group |
classpath |
optional | Inherited from javaprocess jobs. |
commons-io.jar,helloworld.jar |
Xms |
optional | Inherited from javaprocess jobs. |
64M |
Xmx |
optional | Inherited from javaprocess jobs. |
256M |
jvm.args |
optional | Inherited from javaprocess jobs. |
-Dmyprop=test -Dhello=world |
working.dir |
optional | Inherited from command jobs. |
/home/ejk |
env.property |
optional | Inherited from command jobs. |
|
java jobs
Java jobs are any Java classes that have a run() method, such as a java.lang.Runnable. To avoid tying your code to framework-specific interfaces, the Java class does not need to implement any interface; however, Azkaban can make use of all the methods given in the following class (some of which are optional):
Logging should be to a log4j logger with the logger name set to the job name. Azkaban provides a log4j appender that sends these messages to the appropriate job log.
| Property | Required? | Meaning | Default |
|---|---|---|---|
job.class |
required | The Java class to run | |
method.run |
optional | The name of the no-arg method to use for running the job | run |
method.cancel |
optional | The name of the no-arg method to cancel the job | cancel |
method.progress |
optional | The name of the no-arg method to use for getting progress from the job | getProgress |
| Property | Meaning | Example |
|---|---|---|
job.permits |
Used to throttle the number of jobs using a particular resource. See the previous locking section. | 3 |
read.lock |
Comma-separated list of resource locks. Used to obtain a read lock on the named resource. See the previous locking section. | /some/resource/name1,/some/resource/name2 |
write.lock |
Comma-separated list of resource locks. Used to obtain a write lock on the named resource. See the previous locking section. | /some/resource/name1,/some/resource/name2 |
Job Directory Layout
Jobs files are property files that end in.job. Additional properties can be given in .properties files. A property can refer to other properties such as in the following example:
db.url=${db.host}:${db.port}
A common need is to support deploying a single job in many environments (for example, Dev, QA, and Production) and each of these environments has some difference that requires special configuration. To allow this, Azkaban makes all the configuration for a job hierarchical. A job inherits any properties defined in the local directory to which it is deployed, or if the property is not found there, then in the parent directories. To avoid adding any environment-specific properties to the job (such as a particular host name or port), use a variable such as ${some.url}, which is defined in a global properties file. This global properties file can be set in each environment the job needs to run in, and not redeployed with the job.
Other Standard Job Properties
A number of properties are made available to jobs of all types by the framework. These can be set by adding the given property to any job. The following table lists the available properties and their meanings.
| Property | Meaning | Example |
|---|---|---|
dependencies |
A comma-separated list of job names, one for each job depended on. Dependencies are always run first, and a job is only started if all its dependencies complete successfully. | foo, bar |
notify.emails |
A comma-separated list of email addresses to notify upon success and failure of the job | gwb@whitehouse.gov, barryo@whitehouse.gov |
retries |
If your job fails, this property instructs Azkaban to run the job again up to the number of retries given. This is useful if you have a job that is unreliable due to circumstances outside your control, and simply trying again is likely to help. | 3 |
retry.backoff |
The time to wait in between attempts when retries is set to a positive number (see retries property). The job waits for this many milliseconds between attempts. |
30000 |
| Property | Meaning | Example |
|---|---|---|
mail.host |
The hostname of the mail server to which email notifications are sent. | localhost |
mail.user |
The user name on the mail server. | joebob |
mail.password |
The password of the mail server. | password |
scheduler.threads |
The maximum number of threads that can be used for running jobs. | 50 |
total.job.permits |
A number of permits available in the system for assignment to jobs that set the job.permits property. |
50 |