Run a Job
Job run states
When you trigger a Job run, it will go through the following possible states:
- Queued: Each Job run starts in this state when it is triggered, and stays until the Spark driver starts.
- Running: The Job run is actively running on the specified Cluster.
- Completed: The Job run completed without errors.
- Failed: The Job run failed with error(s).
- Canceled: The Job was manually canceled while it was queued or running.
Trigger Job runs
Trigger a one-time run
After a Job is created, you can trigger a run.
- In the Onehouse console, navigate to the Jobs page.
- Open the Job you'd like to run, then click Actions > Run. You can also use the
RUN JOBAPI command.
info
- When you run a Job, Onehouse will pick up the latest version of the JAR or Python script from the cloud storage bucket path in the Job definition. If the path does not exist, the Job run will fail.
- Jobs can only have one concurrent run. While a Job is already Queued or Running, you cannot trigger a new run.
Trigger recurring runs
You can set up your own orchestration to trigger a Job run on a recurring basis using the RUN JOB API command.
Onehouse does not yet offer native orchestration for Jobs, but integrates with most orchestration tools. For example, follow this guide to set up orchestration with Apache Airflow.
Create temporary Clusters for Job runs
The Onehouse APIs enable you to spin up a temporary Cluster, run one or more Jobs, then spin down the Cluster. This pattern is sometimes called "Job flows", and may be a cost-efficient approach when you don't need a persistent cluster.
Perform the following steps, using Onehouse API commands, you can do the following:
- Create a Cluster with CREATE CLUSTER.
- Create Job definitions with CREATE JOB.
- Run the Jobs with RUN JOB
- The API will return a Job run ID.
- If using the Airflow HTTP operator, this response will be captured.
- Check for Job run completion with DESCRIBE JOB_RUN.
- Poll repeatedly for completion.
- If using the Airflow HTTP operator, you can use the
response_checkparameter.
- Delete Cluster with DELETE CLUSTER.