Skip to main content

Lock Provider

A lock provider ensures safe operations when a table has multiple writers. You may enable one lock provider in each Onehouse project.

Onehouse will automatically use the lock provider you configure for all writers in the project, such as Flows, Table Services, Jobs, SQL, etc.

Onehouse supports the following lock provider types:

Lock ProviderDescriptionAWS projectsGCP projects
Onehouse Managed (recommended)Lock coordination managed by Onehouse directly on object storage; no additional infrastructure required✅ Supported✅ Supported
Amazon DynamoDBAWS-managed database that can handle lock coordination✅ SupportedNot supported
Apache ZookeeperSelf-managed, open source lock coordination service✅ Supported✅ Supported

Set up your lock provider

Set up a Onehouse Managed lock provider

Onehouse Managed lock providers are deployed directly in your object storage bucket (Amazon S3 or Google Cloud Storage). Onehouse will manage locks without the need to deploy additional infrastructure.

  1. In the Onehouse console, navigate to Settings > Integrations > Lock Provider. Enter the lock provider configurations:
    1. Select 'Onehouse Managed' as the Provider.
  2. All Onehouse writers within the project will now use the lock provider you added.

Set up a DynamoDB lock provider

  1. In your Onehouse Terraform script or CloudFormation template, ensure that enableDynamoDB is set to true. You'll find this under lockProviderConfig in the Terraform script or 'Lock Provider Config' for CloudFormation.
  2. Create a DynamoDB table in the same AWS account as your Onehouse project. Follow AWS docs to create a DynamoDB table.
    1. Include an attribute with the name "key". Set "key" as the partition key of the DynamoDB table.
    2. You do not need to specify a sort key for the table.
  3. In the Onehouse console, navigate to Settings > Integrations > Lock Provider. Enter the lock provider configurations:
    1. Select 'DynamoDB' as the Provider.
    2. Enter the name of the table you created in DynamoDB.
    3. Enter the AWS region of the DynamoDB table (e.g. 'us-east-1').
  4. All Onehouse writers within the project will now use the lock provider you added.

Set up an Apache Zookeeper lock provider

  1. Create an Apache Zookeeper server that is accessible from the VPC of your Onehouse project.
  2. In the Onehouse console, navigate to Settings > Integrations > Lock Provider. Enter the lock provider configurations:
    1. Select 'Zookeeper' as the Provider.
    2. Enter the Zookeeper serve as a comma-separated list of host:port.
  3. All Onehouse writers within the project will now use the lock provider you added.

Use the lock provider with external writers

Onehouse can write concurrently with external (non-Onehouse) writers.

Avoid Corrupting Tables

Using multiple writers with inconsistent configurations can corrupt tables. Follow all steps below to ensure tables do not get corrupted.

Requirements

  • The external writer must write to tables in the Apache Hudi table format.
  • The external lock provider must use the same lock provider configurations as Onehouse.

Configure external writers

Make sure all existing and future writers to this table are configured in multi-writer mode with the lock provider shared by Onehouse and your external services. Follow this Apache Hudi documentation for the configuration details.

  1. Ensure that your writers are using UTC by setting hoodie.table.timeline.timezone=UTC.
  2. In the Onehouse console, navigate to any table. Click Concurrency Control from the 3-dot table menu. This option is unavailable if a lock provider is not configured.
  3. Click Enable Concurrency. If concurrency is already enabled, you can skip this step.
  4. The popup will show all your concurrency configurations. Copy these from the Onehouse console and add them to your external writer.

Usage recommendations

  • When concurrent writers attempt to write to the same file group, one writer will fail gracefully (using Apache Hudi's Optimistic Concurrency Control). If you write to the same file group with concurrent writers, it is useful to add failure retry logic in your orchestration and to consider temporarily pausing a writer(s) if transactions repeatedly fails. Note that this only applies to transactions writing to the same file group.

Limitations

  • After a lock provider is added in the Onehouse console, it cannot be modified or removed. Contact Onehouse support if you need to change this.
  • Open Engines do not yet integrate with lock providers in Onehouse. You must add your lock provider configurations manually for the Open Engines writer when writing concurrently to a table other Onehouse writers such as Flows or Table Services.