Skip to main content

onehouse_source

Defines a data source for ingestion. A source represents the upstream system that an onehouse_flow reads from — an S3 bucket, a Kafka topic, a Postgres database, and so on.

Canonical reference

This page documents Terraform-specific behavior (HCL syntax, types, mutability, drift, import). For full parameter semantics, valid values, and defaults, see CREATE SOURCE and DELETE SOURCE. The provider does not support source updates — changes force destroy + recreate.

Example Usage

S3 source

resource "onehouse_source" "events" {
name = "raw-events-s3"
source_type = "S3"
s3 {
object_storage_bucket_name = "my-raw-events-bucket"
}
}

object_storage_bucket_name is the bucket name only (not a URI). The bucket must be accessible to the Onehouse control plane (IAM/service-account role configured during cloud-provider connection).

GCS source

resource "onehouse_source" "events_gcs" {
name = "raw-events-gcs"
source_type = "GCS"
gcs {
object_storage_bucket_name = "my-gcs-events-bucket"
}
}

Confluent Kafka with SASL and schema registry

resource "onehouse_source" "kafka" {
name = "events-kafka"
source_type = "CONFLUENT_KAFKA"
credential_type = "ONEHOUSE"
kafka {
bootstrap_servers = "pkc-xxxxx.us-west-2.aws.confluent.cloud:9092"
connection_protocol = "SASL"
security_protocol = "SASL_SSL"
payload_serialization = "avro"

sasl {
mechanism = "scram_sha_256"
key = "SASL_KEY"
secret = "SASL_SECRET"
}

schema_registry {
type = "confluent"
confluent {
servers = "https://psrc-xxxxx.us-west-2.aws.confluent.cloud"
subject_name = "events-value"
key = "SR_KEY"
secret = "SR_SECRET"
}
}
}
}

For production, switch to credential_type = "SECRET_MANAGER" and replace key/secret with key_secret_reference (cloud secret ARN/ID).

Postgres CDC source

resource "onehouse_source" "pg" {
name = "orders-postgres"
source_type = "POSTGRES"
credential_type = "ONEHOUSE"
rdbms {
log_message_bus = "lakelog"
db_config {
host = "pg.internal.example.com"
port = "5432"
database_name = "orders"
user = "onehouse_cdc_user"
password = "secret"
}
lake_log_config {
intermediate_storage_path = "s3://my-bucket/lakelog/orders-pg/"
}
}
}

lake_log_config {} is required when log_message_bus = "lakelog" and not allowed otherwise.

MySQL CDC source via MSK

resource "onehouse_source" "mysql" {
name = "users-mysql"
source_type = "MY_SQL"
credential_type = "ONEHOUSE"
rdbms {
log_message_bus = "msk"
db_config {
host = "mysql.internal.example.com"
port = "3306"
database_name = "users"
user = "onehouse_cdc_user"
password = "secret"
}
}
}

Kinesis source

resource "onehouse_source" "kinesis" {
name = "events-kinesis"
source_type = "KINESIS"
kinesis {
region = "us-west-2"
payload_serialization = "json"
}
}

Onehouse-table source

resource "onehouse_source" "ot" {
name = "curated-source"
source_type = "ONEHOUSE_TABLE"
onehouse_table {
lake = "warehouse"
database = "events"
name = "curated_events"
}
}

Argument Reference

Top-level

ArgumentTypeRequiredMutabilityDescription
namestringImmutableSource name. Unique within the project.
source_typestringImmutableSource type. → details below
credential_typestringfor credential-bearing typesImmutableONEHOUSE (credentials in state) or SECRET_MANAGER (cloud-secret reference). → details

Exactly one type-specific sub-block must be set, matching source_type.

source_type — families and sub-blocks

The provider supports nine source types in five families. Pick the sub-block that matches your source_type.

Familysource_type valuesSub-blockSQL ref
Object storageS3, GCSs3 {} or gcs {}S3 · GCS
Event streamsAPACHE_KAFKA, MSK_KAFKA, CONFLUENT_KAFKAkafka {}Kafka types
AWS KinesisKINESISkinesis {}Kinesis type
Onehouse tablesONEHOUSE_TABLEonehouse_table {}Onehouse table type
Databases (CDC)POSTGRES, MY_SQLrdbms {}Postgres · MySQL

For credential-bearing types (Kafka, RDBMS, Onehouse-table), both ONEHOUSE and SECRET_MANAGER credential modes are supported via the top-level credential_type attribute. See Secrets Management for the trade-offs.

s3 {} / gcs {} block

ArgumentTypeRequiredDescription
object_storage_bucket_namestringBucket name (not a URI). → S3 · GCS

kafka {} block

ArgumentTypeRequiredDescription
bootstrap_serversstringComma-separated host:port list. → details
cloud_resource_identifierstringwhen MSK_KAFKAMSK cluster ARN. Required for source_type = "MSK_KAFKA".
connection_protocolstringPLAINTEXT, TLS, or SASL. → details
security_protocolstringSASL_SSL, SASL_PLAINTEXT, SSL, PLAINTEXT. → details
payload_serializationstringavro, json, proto, confluent_proto, confluent_json_sr. → details
tls {}blockwhen TLSTLS trust/key stores and passwords. → fields below
sasl {}blockwhen SASLSASL credentials and (optional) Confluent keystore/truststore extras. → fields below
schema_registry {}blockoptionalSchema registry config. → fields below

tls {} block

Set when connection_protocol = "TLS". Use ONEHOUSE mode (key_store_password + key_password) or SECRET_MANAGER mode (key_store_password_key_password_secret_reference) — not both. All fields are optional in the schema and write-only (sent on CREATE, never stored in state).

ArgumentTypeRequiredDescription
trust_store_pathstringTLS trust store path on the data-plane host. Write-only.
key_store_pathstringTLS key store path on the data-plane host. Write-only.
key_store_passwordstringONEHOUSE modeTLS key store password. Write-only. Mutex with key_store_password_key_password_secret_reference.
key_passwordstringONEHOUSE modeTLS key password. Write-only. Mutex with key_store_password_key_password_secret_reference.
key_store_password_key_password_secret_referencestringSECRET_MANAGER modeCloud-secret reference (e.g. AWS Secrets Manager ARN) holding both TLS passwords. Write-only. Mutex with the literal *_password fields.

sasl {} block

Set when connection_protocol = "SASL". Use ONEHOUSE mode (key + secret) or SECRET_MANAGER mode (key_secret_reference). The keystore_* / trust_store_* / key_password fields are optional Confluent extras and belong to ONEHOUSE mode. All credential fields are write-only.

ArgumentTypeRequiredDescription
mechanismstringSASL mechanism. One of PLAIN, SCRAM_SHA_256, SCRAM_SHA_512 (case-insensitive).
keystringONEHOUSE modeSASL key. Write-only. Mutex with key_secret_reference.
secretstringONEHOUSE modeSASL secret. Write-only. Mutex with key_secret_reference.
key_secret_referencestringSECRET_MANAGER modeCloud-secret reference holding both SASL key and secret. Write-only. Mutex with key + secret.
keystore_pathstringOptional Confluent extra: SASL keystore path. Write-only.
keystore_passwordstringOptional Confluent extra: SASL keystore password. Write-only.
keystore_typestringOptional Confluent extra: SASL keystore type, e.g. jks or pkcs12. Write-only.
trust_store_pathstringOptional Confluent extra: SASL trust store path. Write-only.
trust_store_passwordstringOptional Confluent extra: SASL trust store password. Write-only.
trust_store_typestringOptional Confluent extra: SASL trust store type, e.g. jks or pkcs12. Write-only.
key_passwordstringOptional Confluent extra: SASL key password. Write-only.

kinesis {} block

ArgumentTypeRequiredDescription
regionstringAWS region of the Kinesis stream (e.g., us-west-2). → details
payload_serializationstringSerialization format. Currently only json is supported. → details
schema_registry {}blockoptionalSchema registry config. → fields below

onehouse_table {} block

ArgumentTypeRequiredDescription
lakestringSource lake name. → details
databasestringSource database name. → details
namestringSource table name. → details
schema_registry {}blockoptionalSchema registry config. → fields below

rdbms {} block

ArgumentTypeRequiredDescription
log_message_busstringlakelog, msk, or (Postgres only) google_managed_kafka. → Postgres · MySQL
db_config {}blockhost, port, database_name, user/password (or user_password_reference for secret-manager mode). → Postgres · MySQL
lake_log_config {}blockwhen log_message_bus = "lakelog"intermediate_storage_path for the lakelog buffer. → Postgres · MySQL
schema_registry {}blockoptionalSchema registry config. → fields below

db_config {} block

ArgumentTypeRequiredDescription
hoststringDatabase server hostname (no port).
portstringDatabase server port.
database_namestringDatabase name to capture from.
userstringONEHOUSE modeDatabase user. Write-only. Mutex with user_password_reference.
passwordstringONEHOUSE modeDatabase password. Write-only. Mutex with user_password_reference.
user_password_referencestringSECRET_MANAGER modeCloud-secret reference holding both DB user and password. Write-only.

lake_log_config {} block

Required when log_message_bus = "lakelog"; not allowed otherwise.

ArgumentTypeRequiredDescription
intermediate_storage_pathstring✅ (when lakelog)Object-storage path used to stage the CDC log buffer.

schema_registry {} block

Shared by the kafka {}, kinesis {}, onehouse_table {}, and rdbms {} blocks. type selects exactly one child sub-block.

ArgumentTypeRequiredDescription
typestringOne of glue, confluent, google, jar, file. → details
glue {} / confluent {} / google {} / jar {} / file {}blockType-specific config; exactly one matching type. → fields below

glue {} (when type = "glue")

ArgumentTypeRequiredDescription
namestringGlue schema registry name.

confluent {} (when type = "confluent")

Accepts either key/secret literals (credential_type = "ONEHOUSE") or key_secret_reference (credential_type = "SECRET_MANAGER").

ArgumentTypeRequiredDescription
serversstringConfluent Schema Registry server URL(s).
subject_namestringOptional. The specific Confluent SR subject to fetch the schema from. If omitted, the Kafka topic name is used as the subject (Confluent's <topic>-value convention). Use this to pin a single schema.
subject_prefixstringOptional. A subject-name prefix used to discover/list all matching SR subjects rather than pinning one. Useful for multi-topic discovery.
keystringONEHOUSE modeConfluent SR key. Write-only.
secretstringONEHOUSE modeConfluent SR secret. Write-only.
key_secret_referencestringSECRET_MANAGER modeCloud-secret reference holding both the Confluent SR key and secret. Write-only.

subject_name and subject_prefix are both optional and serve as alternative ways to locate schemas — provide subject_name to pin one subject, or subject_prefix for prefix-based discovery. Self-hosted registries without auth may omit key/secret/key_secret_reference entirely.

google {} (when type = "google")

ArgumentTypeRequiredDescription
urlstringGoogle Managed Schema Registry URL. Write-only.

jar {} (when type = "jar")

ArgumentTypeRequiredDescription
locationstringJar location for proto schemas.

file {} (when type = "file")

ArgumentTypeRequiredDescription
base_pathstringFile-based schema registry base path.
full_pathstringFile-based schema registry full path.

Attribute Reference

AttributeTypeDescription
idstringOnehouse-assigned source UUID.
created_atstringCreation time in RFC3339.
created_bystringIdentity that created the source.

Import

terraform import onehouse_source.events events-kafka

After import, sensitive fields (passwords, tokens, secrets) inside the type-specific block cannot be recovered from SHOW SOURCES. Re-supply them in your .tf to avoid a forced replacement on the first terraform plan.

Data Source

data "onehouse_source" "lookup" {
name = "events-kafka"
}

output "source_type" {
value = data.onehouse_source.lookup.source_type
}

Limitations

  • No update. Any argument change forces destroy + recreate.
  • Write-only credentials. Sensitive fields (passwords, tokens, SASL secrets) are write-only — they are sent to the server on CREATE but not returned by DESCRIBE. After import, re-supply them in your .tf file.
  • Multi-table on ONEHOUSE_TABLE. Not yet supported (tracked in ENG-41456).
  • Oracle CDC. source_type = "ORACLE" is defined in the proto but not yet supported by the provider.