Skip to main content

onehouse_source

Defines a data source for ingestion. A source represents the upstream system that an onehouse_flow reads from — an S3 bucket, a Kafka topic, a Postgres database, and so on.

Canonical reference

This page documents Terraform-specific behavior (HCL syntax, types, mutability, drift, import). For full parameter semantics, valid values, and defaults, see CREATE SOURCE and DELETE SOURCE. The provider does not support source updates — changes force destroy + recreate.

Example Usage

S3 source

resource "onehouse_source" "events" {
name = "raw-events-s3"
source_type = "S3"
s3 {
object_storage_bucket_name = "my-raw-events-bucket"
}
}

object_storage_bucket_name is the bucket name only (not a URI). The bucket must be accessible to the Onehouse control plane (IAM/service-account role configured during cloud-provider connection).

GCS source

resource "onehouse_source" "events_gcs" {
name = "raw-events-gcs"
source_type = "GCS"
gcs {
object_storage_bucket_name = "my-gcs-events-bucket"
}
}

Confluent Kafka with SASL and schema registry

resource "onehouse_source" "kafka" {
name = "events-kafka"
source_type = "CONFLUENT_KAFKA"
credential_type = "ONEHOUSE"
kafka {
bootstrap_servers = "pkc-xxxxx.us-west-2.aws.confluent.cloud:9092"
connection_protocol = "SASL"
security_protocol = "SASL_SSL"
payload_serialization = "avro"

sasl {
mechanism = "scram_sha_256"
key = "SASL_KEY"
secret = "SASL_SECRET"
}

schema_registry {
type = "confluent"
confluent {
servers = "https://psrc-xxxxx.us-west-2.aws.confluent.cloud"
subject_name = "events-value"
key = "SR_KEY"
secret = "SR_SECRET"
}
}
}
}

For production, switch to credential_type = "SECRET_MANAGER" and replace key/secret with key_secret_reference (cloud secret ARN/ID).

Postgres CDC source

resource "onehouse_source" "pg" {
name = "orders-postgres"
source_type = "POSTGRES"
credential_type = "ONEHOUSE"
rdbms {
log_message_bus = "lakelog"
db_config {
host = "pg.internal.example.com"
port = "5432"
database_name = "orders"
user = "onehouse_cdc_user"
password = "secret"
}
lake_log_config {
intermediate_storage_path = "s3://my-bucket/lakelog/orders-pg/"
}
}
}

lake_log_config {} is required when log_message_bus = "lakelog" and not allowed otherwise.

MySQL CDC source via MSK

resource "onehouse_source" "mysql" {
name = "users-mysql"
source_type = "MY_SQL"
credential_type = "ONEHOUSE"
rdbms {
log_message_bus = "msk"
db_config {
host = "mysql.internal.example.com"
port = "3306"
database_name = "users"
user = "onehouse_cdc_user"
password = "secret"
}
}
}

Onehouse-table source

resource "onehouse_source" "ot" {
name = "curated-source"
source_type = "ONEHOUSE_TABLE"
onehouse_table {
lake = "warehouse"
database = "events"
name = "curated_events"
}
}

Argument Reference

Top-level

ArgumentTypeRequiredMutabilityDescription
namestringImmutableSource name. Unique within the project.
source_typestringImmutableSource type. → details below
credential_typestringfor credential-bearing typesImmutableONEHOUSE (credentials in state) or SECRET_MANAGER (cloud-secret reference). → details

Exactly one type-specific sub-block must be set, matching source_type.

source_type — families and sub-blocks

The provider supports eight source types in four families. Pick the sub-block that matches your source_type.

Familysource_type valuesSub-blockSQL ref
Object storageS3, GCSs3 {} or gcs {}S3 · GCS
Event streamsAPACHE_KAFKA, MSK_KAFKA, CONFLUENT_KAFKAkafka {}Kafka types
Onehouse tablesONEHOUSE_TABLEonehouse_table {}Onehouse table type
Databases (CDC)POSTGRES, MY_SQLrdbms {}Postgres · MySQL

For credential-bearing types (Kafka, RDBMS, Onehouse-table), both ONEHOUSE and SECRET_MANAGER credential modes are supported via the top-level credential_type attribute. See Secrets Management for the trade-offs.

s3 {} / gcs {} block

ArgumentTypeRequiredDescription
object_storage_bucket_namestringBucket name (not a URI). → S3 · GCS

kafka {} block

ArgumentTypeRequiredDescription
bootstrap_serversstringComma-separated host:port list. → details
cloud_resource_identifierstringwhen MSK_KAFKAMSK cluster ARN. Required for source_type = "MSK_KAFKA".
connection_protocolstringSASL or SSL or PLAINTEXT. → details
security_protocolstringSASL_SSL, SASL_PLAINTEXT, SSL, PLAINTEXT. → details
payload_serializationstringavro, json, proto, confluent_proto, confluent_json_sr. → details
tls {}blockoptionalTLS certs and keys (4 fields). → details
sasl {}blockwhen SASLSASL credentials (mechanism, key/secret or key_secret_reference). → details
schema_registry {}blockoptionalSchema registry config (see below). → details

onehouse_table {} block

ArgumentTypeRequiredDescription
lakestringSource lake name. → details
databasestringSource database name. → details
namestringSource table name. → details
schema_registry {}blockoptionalSchema registry config. → details

rdbms {} block

ArgumentTypeRequiredDescription
log_message_busstringlakelog, msk, or (Postgres only) google_managed_kafka. → Postgres · MySQL
db_config {}blockhost, port, database_name, user/password (or user_password_reference for secret-manager mode). → Postgres · MySQL
lake_log_config {}blockwhen log_message_bus = "lakelog"intermediate_storage_path for the lakelog buffer. → Postgres · MySQL
schema_registry {}blockoptionalSchema registry config. → details

schema_registry {} block

ArgumentTypeRequiredDescription
typestringOne of glue, confluent, google, jar, file. → details
glue {} / confluent {} / google {} / jar {} / file {}blockType-specific config; exactly one matching type. → details

Confluent SR's confluent {} block accepts either key/secret literals (credential_type = "ONEHOUSE") or key_secret_reference (credential_type = "SECRET_MANAGER").

Attribute Reference

AttributeTypeDescription
idstringOnehouse-assigned source UUID.
created_atstringCreation time in RFC3339.
created_bystringIdentity that created the source.

Import

terraform import onehouse_source.events events-kafka

After import, sensitive fields (passwords, tokens, secrets) inside the type-specific block cannot be recovered from SHOW SOURCES. Re-supply them in your .tf to avoid a forced replacement on the first terraform plan.

Data Source

data "onehouse_source" "lookup" {
name = "events-kafka"
}

output "source_type" {
value = data.onehouse_source.lookup.source_type
}

Limitations

  • No update. Any argument change forces destroy + recreate.
  • Multi-table on ONEHOUSE_TABLE. Not yet supported (tracked in ENG-41456).
  • Oracle CDC. source_type = "ORACLE" is defined in the proto but not yet supported by the provider.