Source spec reference

Coral is extensible through source specs. These are YAML files that describe how to connect to an API or read a local dataset.

For a guided walkthrough, see Write a custom source. Use this page for field-level details, including HTTP authentication, source inputs, OAuth credential methods, response mapping, pagination, search functions, and file-backed tables.

Top-level fields

Every source spec starts with:

name: my_source
version: 0.1.0
dsl_version: 3
backend: http
test_queries:
  - SELECT * FROM my_source.messages LIMIT 1

Field	Description
`name`	Source name, also used as the SQL schema name
`version`	Source spec version string
`dsl_version`	Coral source spec DSL version (currently `3`)
`backend`	How data is fetched: `http`, `file`, or `mcp`
`test_queries`	Optional read-only SQL checks Coral runs during source validation

Test queries

Use the optional top-level test_queries field when you want Coral to run one or more read-only SQL checks after the source validates and its tables become queryable.

test_queries:
  - SELECT * FROM my_source.messages LIMIT 1

Notes:

coral source lint validates the field but does not execute the queries.
Each query result records the SQL text, pass/fail status, and either a row count or an error message.
coral source add, coral source add --file, and coral onboard surface post-install validation issues as warnings; coral source test is the strict pass/fail command.
Prefer cheap read-only checks that validate connectivity and mapping logic. For example, SELECT * FROM schema.table LIMIT 1 is usually a better fit than SELECT COUNT(*).
For credentialed sources, validate both a correctly configured source and an intentionally misconfigured one. The configured source should pass; the misconfigured source should still install but make coral source test exit non-zero.
Keep these queries read-only. Statements such as CREATE, COPY, INSERT, UPDATE, DELETE, or SET are reported as failed validation queries.

Column types

The type field on each column accepts:

Type	Description
`Utf8`	UTF-8 string
`Int64`	64-bit signed integer
`Float64`	64-bit floating point
`Boolean`	Boolean true/false
`Timestamp`	UTC timestamp (microsecond precision)
`Json`	UTF-8 string containing JSON. Hint that the column is queryable with JSON accessors. See “Querying JSON columns” below.

Each column also supports:

nullable (boolean, default true): whether the column can contain NULL values
description (string): human-readable description, surfaced through coral.columns

Nested field naming

Coral uses a double underscore (__) naming convention for flattened nested or related fields. Treat the full name as the SQL column name:

assignee__name means the name field inside the assignee object or relation
repository__owner__login means repository.owner.login

This is a naming convention, not special SQL syntax. Query the column exactly as it appears in coral.columns:

SELECT number, title, assignee__name
FROM github.issues
WHERE owner = 'withcoral' AND repo = 'coral'
ORDER BY number DESC
LIMIT 10

Do not rewrite these names with dots such as assignee.name or repository.owner.login. When you are unsure which flattened names a source exposes, inspect them first:

SELECT column_name, description
FROM coral.columns
WHERE schema_name = 'github' AND table_name = 'issues'
ORDER BY ordinal_position

When authoring source specs, prefer the parent__child style for flattened nested fields so related values are recognizable in SQL and metadata tables.

Querying JSON columns

Json-typed columns (and any Utf8 column whose values happen to be JSON) can be queried with the built-in JSON functions:

SELECT id, json_get_str(properties, '$browser') AS browser
FROM posthog.events
WHERE json_get_int(properties, 'count') > 5

Available: json_get, json_get_str, json_get_int, json_get_float, json_get_bool, json_get_array, json_get_json, json_contains, json_length, json_as_text. For detailed function behavior, see the datafusion-functions-json reference. Use plain object keys or array indexes with these functions. They do not accept JSONPath syntax such as $.browser. For example, json_get_str(properties, 'browser') looks up the browser key. Some sources, including PostHog, use literal keys that start with $, so json_get_str(properties, '$browser') looks up the exact key name $browser.

Column expressions

The expr field on a column defines how to extract or compute its value from each row. When omitted, Coral uses a path expression matching the column name.

Kind	Description
`path`	Extract a value by JSON path segments
`join_array_path`	Join one nested scalar field from each object in an array
`format_timestamp`	Convert an epoch timestamp to a `Timestamp` column
`base64_decode`	Decode a base64 string expression into UTF-8 text
`replace`	Replace all occurrences of one substring in the rendered value
`template`	Build a string from a template with placeholder substitutions

`join_array_path`

Extracts one nested value from each object in an array and joins the values into one string. Use this when an API returns object arrays such as labels, tags, or owners and the table should expose a compact human-readable column.

- name: label_names
  type: Utf8
  expr:
    kind: join_array_path
    path:
      - labels
      - nodes
    item_path:
      - name
    separator: ","

Field	Type	Default	Description
`path`	array	—	JSON path to the array
`item_path`	array	—	JSON path to extract from each array item
`separator`	string	`,`	Separator used between non-null extracted items

`format_timestamp`

Converts a numeric/string epoch value or an ISO 8601 string into a Timestamp column. Coral stores Timestamp columns as epoch microseconds internally, so the rendered output has microsecond precision even when the source value is provided in seconds or milliseconds. format_timestamp currently accepts raw epoch inputs in seconds or milliseconds, plus ISO 8601 / RFC 3339 strings via iso8601. APIs that vend only microseconds or nanoseconds are not supported by this helper yet.

- name: created_at
  type: Timestamp
  expr:
    kind: format_timestamp
    input: seconds        # or "milliseconds" / "iso8601"
    expr:
      kind: path
      path:
        - ts

Field	Type	Default	Description
`input`	string	`seconds`	Unit/format of the raw value: `seconds`, `milliseconds`, or `iso8601`
`expr`	expr	—	Inner expression that produces the timestamp value

`base64_decode`

Decodes an inner expression as base64 and returns UTF-8 text. Whitespace in the encoded value is ignored, which is useful for APIs that wrap base64 file contents across lines. Invalid base64 or non-UTF-8 bytes produce NULL.

- name: content_text
  type: Utf8
  expr:
    kind: base64_decode
    expr:
      kind: path
      path:
        - content

Field	Type	Default	Description
`expr`	expr	—	Inner expression that produces base64

`replace`

Evaluates an inner expression as a string and replaces all occurrences of from with to.

- name: ts_id
  type: Utf8
  expr:
    kind: replace
    expr:
      kind: path
      path:
        - ts
    from: "."
    to: ""

Field	Type	Default	Description
`expr`	expr	—	Inner expression that produces a string
`from`	string	—	Non-empty substring to replace
`to`	string	—	Replacement text

`template`

Builds a string by substituting parsed template tokens with either:

{{expr.name}}: a named sub-expression from values
{{filter.name}}: a query filter value

Like other Coral templates, |default supplies a fallback when the value is missing at runtime. Column-expression templates do not support secret, variable, or state namespaces.

- name: permalink
  type: Utf8
  expr:
    kind: template
    template: "https://example.com/archives/{{filter.channel}}/p{{expr.ts_id}}"
    values:
      ts_id:
        kind: replace
        expr:
          kind: path
          path:
            - ts
        from: "."
        to: ""

Field	Type	Default	Description
`template`	string	—	Parsed Coral template string using `{{expr.}}` and `{{filter.}}`
`values`	object	—	Named sub-expressions referenced from `{{expr.name}}` tokens

Relation names

Tables and source-scoped table functions share one SQL relation namespace within a source. Name comparisons are case-insensitive, so Issues, issues, and a table function named issues conflict. Table names must be non-empty. Prefer plain snake_case table names, but names that require SQL quoting, such as player.stats, remain valid for compatibility and can be queried as quoted identifiers. Table-function names must be ASCII identifiers: start with a letter or underscore, then use only letters, numbers, or underscores.

File-backed tables

For backend: file, each table declares its own format and points at a location:

name: local_messages
version: 0.1.0
dsl_version: 3
backend: file
tables:
  - name: messages
    description: Demo messages
    format: jsonl
    source:
      location: file:///absolute/path/to/demo-data/
      glob: "**/*.jsonl"
    columns:
      - name: type
        type: Utf8
      - name: session_id
        type: Utf8
      - name: text
        type: Utf8

Notes:

Supported formats are parquet, jsonl, json, and csv
Supported transports are file:// and s3:// for every file format
If you omit glob, Coral uses a format-specific default

JSONL and JSON

jsonl reads newline-delimited JSON objects. json reads JSON arrays of objects. Tables using either format must declare columns. Declare nested object or array fields as type: Json; Coral exposes those values as JSON text so they can be queried with json_get_* functions such as json_get_str(payload, 'id').

CSV

CSV tables must declare columns. They support format_options.has_header and format_options.delimiter.

Parquet

Parquet tables can infer columns when columns: [] is used.

Shared file table constraints

File-backed tables do not support declared filters, virtual columns, or columns[*].expr; use SQL projections and predicates instead. For s3:// locations, declare the S3 object-store policy under the table source instead of relying on magic input names:

inputs:
  AWS_REGION:
    kind: variable
    default: us-east-1
  AWS_ACCESS_KEY_ID:
    kind: secret
  AWS_SECRET_ACCESS_KEY:
    kind: secret
tables:
  - name: events
    description: S3 events
    format: jsonl
    source:
      location: s3://example-bucket/events/
      object_store:
        type: s3
        region: "{{input.AWS_REGION}}"
        auth:
          type: access_key
          access_key_id: "{{input.AWS_ACCESS_KEY_ID}}"
          secret_access_key: "{{input.AWS_SECRET_ACCESS_KEY}}"

Use auth: { type: instance_profile } when the runtime should use the host’s AWS instance profile.

File path partitions

File-backed tables can expose partition columns derived from object paths. When path is omitted, partitions use Hive-style path segments: Partition columns support Utf8, Int64, Boolean, Float64, and Json; Timestamp partitions are not supported.

source:
  location: file:///absolute/path/to/events/
  glob: "**/*.jsonl"
  partitions:
    - name: year
      type: Int64
    - name: month
      type: Int64

This expects paths such as:

year=2026/month=05/events.jsonl

For JSON and JSONL tables, positional path segments are also supported:

source:
  location: file:///Users/james/.codex/sessions/
  glob: "20??/**/*.jsonl"
  partitions:
    - name: year
      type: Int64
      path:
        kind: segment
        index: 0
    - name: month
      type: Int64
      path:
        kind: segment
        index: 1
    - name: day
      type: Int64
      path:
        kind: segment
        index: 2

This maps paths such as 2026/05/14/session.jsonl to partition columns year = 2026, month = 5, and day = 14. SQL predicates on partition columns can prune unrelated files before JSON rows are decoded. For JSON and JSONL tables, declaring partitions makes the path layout part of the table contract. Files that do not provide every declared partition value in the configured layout fail the query; matching paths with invalid typed partition values also fail the query. Parquet and CSV tables use DataFusion’s native listing table reader and support Hive-style partitioning. Positional segment partitioning is currently limited to JSON and JSONL tables because those formats already use Coral’s custom JSON row mapper.

File metadata

File-backed tables can expose metadata columns derived from each scanned file. JSONL tables can also expose row-scoped line numbers. Metadata columns are declared under source.metadata and are appended to the table schema:

source:
  location: file:///Users/james/.codex/sessions/
  glob: "20??/**/*.jsonl"
  metadata:
    - name: session_path
      kind: relative_path
    - name: session_file
      kind: file_stem
    - name: event_index
      kind: line_number

Supported kind values are:

relative_path: the file path relative to source.location; for a single-file source.location, this is the scanned file name (Utf8)
file_name: the final file name, including its extension (Utf8)
file_stem: the final file name without its last extension (Utf8)
line_number: the one-based line number within the JSONL file (Int64); only supported for format: jsonl

Metadata column names must not duplicate declared payload columns or partition columns. For inferred Parquet tables, metadata column names must also not duplicate physical file columns. If an undeclared raw JSON property has the same name as a metadata column, queries see the declared metadata value for that column name. File metadata preserves the object-store path text; literal percent sequences in object keys are not URL-decoded again.

HTTP-backed tables

For http, the source spec declares request and response behavior.

name: demo_api
version: 0.1.0
dsl_version: 3
backend: http
inputs:
  API_BASE:
    kind: variable
    default: https://api.example.com
  API_TOKEN:
    kind: secret
base_url: "{{input.API_BASE}}"
auth:
  type: HeaderAuth
  headers:
    - name: Authorization
      from: bearer
      key: API_TOKEN
tables:
  - name: messages
    description: Messages from the demo API
    request:
      method: GET
      path: /messages
    response: {}
    columns:
      - name: id
        type: Utf8
      - name: text
        type: Utf8

In this minimal example:

response: {} means “use the default response rules” for an HTTP table
with the default response rules, Coral treats the selected response value directly as rows unless you configure a different row_strategy
inputs declares values Coral collects and stores when the source is added
auth declares how stored values are used to authenticate outgoing HTTP requests
{{input.API_TOKEN}} is resolved from the secret store because API_TOKEN is declared as kind: secret

Table filters

HTTP tables can declare filters that Coral binds from SQL WHERE predicates into provider requests. MCP tables use the same SQL-facing filter fields and add a required tool_arg field to bind the filter to an MCP tool argument.

Field	Type	Default	Description
`name`	string	required	SQL filter name
`type`	type	`Utf8`	Filter value type
`required`	boolean	`false`	Whether queries must provide this filter
`mode`	string	`equality`	Predicate mode: `equality`, `contains`, or compatibility-only `search`
`description`	string	`""`	Human-readable filter description surfaced through `coral.filters`
`tool_arg`	string	MCP only	MCP table tool argument that receives the filter value

Use mode: equality for exact lookup and scoping filters. Use mode: contains only when the provider supports normal substring matching for a table filter. mode: search remains accepted for compatibility with older table manifests, but new provider-ranked search endpoints should be modeled as kind: search table functions with search_limits.

HTTP table functions

HTTP sources can declare source-scoped table functions for endpoints that return rows and need invocation arguments. A function adds invocation arguments, but it still owns the same execution shape as an HTTP table: request, response mapping, pagination, and result columns. It does not need a backing table. Use the default kind: table for parameterized non-retrieval operations, such as scoped child collections, time-range log queries, metrics queries, or detail operations that do not map cleanly to a stable table. If the endpoint is a provider search surface, such as GitHub issue search, use kind: search and declare search_limits.

functions:
  - name: search_issues
    kind: search
    description: Search GitHub issues
    search_limits:
      default_top_k: 10
      max_top_k: 100
      max_calls_per_query: 1
    args:
      - name: q
        required: true
        bind:
          arg: q
      - name: mode
        values: [lexical, semantic, hybrid]
        bind:
          arg: search_type
    request:
      method: GET
      path: /search/issues
      query:
        - name: q
          from: arg
          key: q
        - name: search_type
          from: arg
          key: search_type
    response:
      rows_path: [items]
      allow_404_empty: false
      row_strategy: direct
    pagination:
      mode: page
      page_size:
        default: 30
        max: 100
        query_param: per_page
      page_param: page
      page_start: 1
      page_step: 1
      offset_start: 0
      link_header_require_results: false
    columns:
      - name: id
        type: Utf8
      - name: title
        type: Utf8
      - name: html_url
        type: Utf8
      - name: score
        type: Float64

Use bind.arg when a SQL argument should populate a differently named request argument. Function request values can then reference those bound arguments with from: arg. kind: search marks a provider-ranked retrieval surface. Search functions return provider-ranked candidate rows, not exhaustive SQL-filtered tables. Other table functions keep the default kind: table. Use named SQL arguments when calling table functions:

SELECT id, title, html_url
FROM github.search_issues(q => 'repo:withcoral/coral source functions')
LIMIT 10;

Search result columns should help users and agents decide which candidate to inspect next. Prefer stable identifiers and follow-up handles such as id, html_url, title, score, rank, or provider timestamps. If a search row is intentionally thin, expose enough stable identifiers for ordinary detail tables to fetch the full record. mode: contains remains available for ordinary table filters whose provider API supports substring matching. It is not a retrieval marker and should not be used as a replacement for kind: search:

filters:
  - name: title
    type: Utf8
    description: Provider-side title substring.
    mode: contains

Provider-ranked retrieval surfaces must be declared as kind: search functions. Table filters are for exact lookup, required scoping, or ordinary provider-side filtering on list/detail tables. For kind: search functions, search_limits is required. default_top_k is used when SQL does not specify a LIMIT, max_top_k caps candidates requested per provider call, and max_calls_per_query caps provider calls for one SQL query. Values must be positive. default_top_k and max_top_k are capped at 1000, max_calls_per_query is capped at 100, and max_top_k * max_calls_per_query cannot exceed 10000 candidates.

MCP sources

MCP sources expose Model Context Protocol tools as SQL tables and source-scoped table functions. The manifest remains the contract for SQL names, filters, arguments, response shape, and columns; Coral does not introspect MCP tools at query time. Stdio MCP servers run as child processes:

backend: mcp
server:
  transport: stdio
  command: github-mcp-server
  args: []
  env:
    - name: GITHUB_TOKEN
      from: input
      key: GITHUB_TOKEN

Remote MCP servers that implement Streamable HTTP can be called directly:

backend: mcp
server:
  transport: streamable_http
  url: https://mcp.example.com/mcp

If the Streamable HTTP server needs bearer authentication, declare the token as a secret input and reference it from server.auth. OAuth credential metadata is authored the same way as HTTP sources: under the top-level secret input’s credential.methods. The server.auth block only decides how the stored secret is attached to MCP HTTP requests.

inputs:
  MCP_ACCESS_TOKEN:
    kind: secret
    credential:
      methods:
        - type: oauth
          label: Connect
          oauth:
            flow:
              type: authorization_code
              pkce: required
            redirect_uri: http://127.0.0.1:0/oauth/callback
            redirect_uri_port_mode: random
            endpoints:
              authorization_url: https://provider.example.com/oauth/authorize
              token_url: https://provider.example.com/oauth/token
            client:
              id:
                default: coral-client-id
            scopes:
              scope:
                delimiter: space
                values: [read]

server:
  transport: streamable_http
  url: https://mcp.example.com/mcp
  auth:
    type: bearer
    from: input
    key: MCP_ACCESS_TOKEN

MCP tables call one tool and map the JSON result into rows:

tables:
  - name: issues
    tool: list_issues
    filters:
      - name: state
        type: Utf8
        required: false
        mode: equality
        tool_arg: state
    response:
      rows_path: [items]
    columns:
      - name: id
        type: Utf8
      - name: title
        type: Utf8

MCP table functions use named SQL arguments:

functions:
  - name: run_query
    tool: run_query
    args:
      - name: query
        required: true
        bind:
          arg: query
    response:
      rows_path: [rows]
    columns:
      - name: row
        type: Json
        expr:
          kind: current_row

SELECT * FROM my_mcp.run_query(query => 'SELECT 1');

MCP tables and functions can also declare cursor pagination:

pagination:
  cursor_arg: cursor
  response_cursor_path: [meta, nextCursor]
  max_pages: 5

Streamable HTTP token refresh is not handled by the MCP transport itself. If a server rejects an expired token, Coral surfaces an auth failure; refresh support is owned by the shared OAuth credential flow. Each tool call opens a fresh MCP session: stdio sources spawn a child process per tools/call, and Streamable HTTP sources run the initialize handshake on every call. A single SQL query that scans multiple MCP tables incurs that handshake once per table scan. Session pooling is a future optimization.

MCP error reasons

Query-time failures surface as one of the following structured error reasons. Every reason carries source and mcp_stage metadata. Tool-scoped reasons (MCP_TOOL_CALL_FAILED, MCP_TOOL_RETURNED_ERROR, MCP_RESULT_DECODE_FAILED, MCP_PAGINATION_FAILED) additionally carry relation and tool. Transport, auth, and server-start reasons do not, since they fire before a specific tool call is identified.

Reason	Meaning
`MCP_SERVER_START_FAILED`	The stdio MCP server process could not be spawned.
`MCP_INITIALIZE_FAILED`	The MCP `initialize` handshake failed. Retryable.
`MCP_AUTH_REQUIRED`	A Streamable HTTP server returned 401 with `WWW-Authenticate`. Reinstall the source with a valid credential.
`MCP_AUTH_FAILED`	A Streamable HTTP server returned 403 with insufficient scope. Update the manifest scopes and reinstall.
`MCP_HTTP_REQUEST_FAILED`	The HTTP request to a Streamable HTTP MCP server failed before a response was received (connect, DNS, TLS). Retryable.
`MCP_HTTP_STATUS_FAILED`	A Streamable HTTP MCP server returned a non-success status that is not an authentication failure. Retryable.
`MCP_HTTP_SSE_DECODE_FAILED`	A Streamable HTTP MCP server returned an undecodable SSE stream or an unexpected content type.
`MCP_SESSION_EXPIRED`	A Streamable HTTP MCP session was rejected (HTTP 404) and the transport could not transparently reinitialize. Retryable.
`MCP_TOOL_CALL_FAILED`	The transport or protocol layer rejected the `tools/call`. Retryable.
`MCP_TOOL_RETURNED_ERROR`	The tool ran and reported a business-logic failure on the manifest’s declared `response.error_path`.
`MCP_RESULT_DECODE_FAILED`	The tool’s structured content did not match the declared `response.rows_path` or column types.
`MCP_PAGINATION_FAILED`	Cursor pagination exceeded `max_pages` without terminating.

HTTP authentication

The top-level auth block declares how Coral authenticates outgoing requests. It does not collect or store credentials; reference values declared under Source inputs when auth needs API keys, bearer tokens, usernames, passwords, or signing material. Pick one of three type values:

`type`	Purpose
`HeaderAuth`	Attach one or more declarative auth headers (Bearer tokens, API keys, pre-signed headers).
`BasicAuth`	HTTP Basic. Coral base64-encodes `username:password` into `Authorization: Basic ...`.
`CustomAuth`	Use a named authenticator for auth schemes that need request signing, such as AWS SigV4.

`HeaderAuth`

auth:
  type: HeaderAuth
  headers:
    - name: Authorization
      from: bearer
      key: API_TOKEN

headers entries share the same value-source shape as request/table headers, but auth is source-scoped: from: literal, from: template, from: input, from: bearer, and from: one_of are meaningful here. Filter, argument, and state values have no value at auth time. Use from: bearer when Coral stores a raw token but the API expects an Authorization: Bearer <token> header. If the token was collected with an OAuth credential method, runtime auth still uses the same stored secret input. Use from: one_of when a source supports multiple credential shapes. Coral uses the first value that is present and non-empty:

inputs:
  API_KEY:
    kind: secret
    required: false
  OAUTH_ACCESS_TOKEN:
    kind: secret
    required: false

auth:
  type: HeaderAuth
  headers:
    - name: Authorization
      from: one_of
      values:
        - from: input
          key: API_KEY
        - from: bearer
          key: OAUTH_ACCESS_TOKEN

`BasicAuth`

auth:
  type: BasicAuth
  username: "{{input.API_USER}}"
  password: "{{input.API_TOKEN}}"

username and password are templates; {{input.KEY}} tokens are honored. Coral encodes them as Authorization: Basic base64(user:pass).

`CustomAuth`

auth:
  type: CustomAuth
  authenticator: aws_sigv4
  # ...authenticator-specific fields, see below

The authenticator key names a runtime-provided authenticator. All other fields under auth are passed through as that authenticator’s config. Use this type when signing depends on the final request contents.

`aws_sigv4`

Signs outgoing requests with AWS Signature Version 4. Useful for querying AWS monitoring services via CloudWatch, CloudTrail, or any other SigV4-protected API. Example against CloudWatch Logs:

auth:
  type: CustomAuth
  authenticator: aws_sigv4
  service: logs
  region: "{{input.AWS_REGION}}"
  access_key_id: "{{input.AWS_ACCESS_KEY_ID}}"
  secret_access_key: "{{input.AWS_SECRET_ACCESS_KEY}}"

Field	Required	Notes
`service`	yes	AWS service code (`logs`, `monitoring`, `cloudtrail`, `execute-api`, …)
`region`	yes	AWS region; templated from inputs
`access_key_id`	yes	AWS access key ID
`secret_access_key`	yes	AWS secret access key
`session_token`	no	Omit for long-term credentials; include for STS/assumed-role sessions

For STS or assumed-role temporary credentials, also declare AWS_SESSION_TOKEN in the source’s inputs block and include it in the authenticator config:

inputs:
  # ...existing AWS_REGION, AWS_ACCESS_KEY_ID, and AWS_SECRET_ACCESS_KEY inputs
  AWS_SESSION_TOKEN:
    kind: secret
    hint: AWS session token for temporary credentials.

auth:
  # ...existing aws_sigv4 config
  type: CustomAuth
  authenticator: aws_sigv4
  service: logs
  region: "{{input.AWS_REGION}}"
  access_key_id: "{{input.AWS_ACCESS_KEY_ID}}"
  secret_access_key: "{{input.AWS_SECRET_ACCESS_KEY}}"
  session_token: "{{input.AWS_SESSION_TOKEN}}"

S3 note: service: s3 and service: s3-outposts switch to S3-specific signing (single percent-encoding, path normalization disabled, X-Amz-Content-Sha256 enabled). This is required to avoid SignatureDoesNotMatch on bucket operations and is handled automatically.

Non-auth request headers

Headers required on every request but not used for auth (Accept, API version, etc.) go in a top-level request_headers: block (sibling of auth:), not inside auth.headers:

request_headers:
  - name: Accept
    from: literal
    value: application/json
  - name: X-Api-Version
    from: literal
    value: "2024-10-01"

request_headers resolves before auth on each request. If an auth header has the same name, it overwrites the value from request_headers. Coral sets User-Agent: coral on every outgoing request automatically — do not declare it in manifests.

HTTP rate limiting

The optional rate_limit object gives Coral provider-specific hints for classifying rate-limit responses on HTTP sources. Coral always treats 429 Too Many Requests as a rate-limit signal. Use extra_statuses when a provider also reports rate limits through another status code such as GitHub’s 403 Forbidden. Extra statuses are only treated as rate limits when the response also carries one of the configured rate-limit headers; otherwise Coral keeps treating them as ordinary provider errors.

Field	Type	Default	Description
`extra_statuses`	list of integers	`[]`	Additional HTTP status codes Coral should consider as possible rate limits
`retry_after_header`	string	`Retry-After`	Header that carries a retry delay in seconds or HTTP-date form
`remaining_header`	string	—	Header whose value `0` means the current quota is exhausted
`reset_header`	string	—	Header with the quota reset time as Unix epoch seconds

Example for a GitHub-style API:

rate_limit:
  extra_statuses: [403]
  remaining_header: X-RateLimit-Remaining
  reset_header: X-RateLimit-Reset

HTTP dependent-join capability metadata

HTTP table filters can opt in to dependent predicate pushdown. This metadata is table-local: it says a filter may be supplied from a runtime join key, not that the table depends on a specific source pair. The manifest contract is deliberately narrow:

Field	Scope	Type	Default	Description
`filters[].lookup_key`	filter	boolean	`false`	Allows the filter to be supplied from a runtime join key

Lookup-key filters must use mode: equality. Do not mark ranked, capped, fuzzy, or otherwise incomplete search-result filters as lookup keys; dependent joins rely on complete filtered result sets to preserve SQL join correctness. Dependent join execution buffers resolver rows and dependent fetch results before producing joined output. Row, binding, and fanout limits are engine runtime policy, not manifest metadata. Example:

tables:
  - name: pull_requests
    description: Pull requests
    request:
      path: /repos/{{filter.owner}}/{{filter.repo}}/pulls/{{filter.number}}
    filters:
      - name: owner
        lookup_key: true
      - name: repo
        lookup_key: true
      - name: number
        type: Int64
        lookup_key: true
    columns:
      - name: number
        type: Int64

HTTP value sources

Request query params, body fields, and headers support these from values:

`from` value	Description
`literal`	Use the manifest-authored JSON value directly
`template`	Render a string template with `input`, `filter`, or `state` tokens
`filter`	Read a table filter captured from the query `WHERE` clause and serialize it as a string
`filter_int`	Read a table filter captured from the query `WHERE` clause and serialize it as a JSON integer
`filter_bool`	Read a table filter captured from the query `WHERE` clause and serialize it as a JSON boolean
`filter_split`	Split a table filter string and serialize one part as a string
`filter_split_int`	Split a table filter string and serialize one part as a JSON integer
`arg`	Read a source function argument captured from the function call and serialize it as a string
`arg_int`	Read a source function argument captured from the function call and serialize it as a JSON integer
`arg_bool`	Read a source function argument captured from the function call and serialize it as a JSON boolean
`arg_split`	Split a source function argument string and serialize one part as a string
`arg_split_int`	Split a source function argument string and serialize one part as a JSON integer
`input`	Read a manifest-declared source input (variable/secret) by key
`bearer`	Read a manifest-declared secret input by key and serialize it as `Bearer <value>`
`one_of`	Resolve nested value sources in order and use the first present, non-empty value
`state`	Read pagination/runtime state
`now_epoch_minus_seconds`	Emit the current Unix epoch seconds minus the configured offset

filter* value sources differ from arg* value sources by where the value comes from. Table requests use filter value sources because their request values come from SQL predicates such as WHERE id = 123. Source function requests use arg value sources because their request values come from named function arguments such as github.search_issues(q => 'flaky'). The suffix controls the value type sent to the provider. Use the bare form for a string request value, _int for a JSON integer, and _bool for a JSON boolean. Boolean filters used with filter_bool can be written as normal SQL boolean predicates, for example WHERE include_archived IS FALSE or WHERE include_archived = false. Use filter_split and filter_split_int when a provider requires structured request fields but users naturally have one compound identifier. For example, SOURCE-496 can become a string team key and integer issue number:

- path: [variables, teamKey]
  from: filter_split
  key: issue_identifier
  separator: "-"
  part: 0
- path: [variables, issueNumber]
  from: filter_split_int
  key: issue_identifier
  separator: "-"
  part: 1

Use arg_split and arg_split_int the same way for source-scoped table functions:

args:
  - name: issue
    required: true
    bind:
      arg: issue
request:
  body:
    - path: [variables, teamKey]
      from: arg_split
      key: issue
      separator: "-"
      part: 0
    - path: [variables, issueNumber]
      from: arg_split_int
      key: issue
      separator: "-"
      part: 1

HTTP response fields

The response object controls how Coral extracts rows from an HTTP response body.

Field	Type	Default	Description
`rows_path`	list of strings	`[]`	JSON path segments to the array of rows in the response
`ok_path`	list of strings	`[]`	JSON path to a field indicating success
`error_path`	list of strings	`[]`	JSON path to a field containing an error message
`allow_404_empty`	boolean	`false`	Treat HTTP 404 as an empty result set instead of an error
`row_strategy`	string	`direct`	How to convert the selected value into rows. `direct` uses the array or object as-is. `dict_entries` maps a JSON object’s key-value pairs into rows. `series_point_list` flattens specialized timeseries arrays (e.g., Datadog metrics).

Example with rows_path:

response:
  rows_path:
    - data
    - items
  row_strategy: direct

This extracts rows from response_body.data.items instead of treating the entire response as the row array.

HTTP pagination

The pagination object controls how Coral fetches additional pages of results.

Field	Type	Default	Description
`mode`	string	`none`	Pagination strategy: `none`, `auto`, `cursor_query`, `cursor_body`, `page`, `offset`, or `link_header`
`cursor_param`	string	—	Query parameter name for the cursor (`cursor_query` mode)
`response_cursor_path`	list of strings	`[]`	JSON path in the response to the next cursor value
`cursor_body_path`	list of strings	`[]`	JSON path in the request body for the cursor (`cursor_body` mode)
`page_param`	string	—	Query parameter name for the page number (`page` mode)
`page_start`	integer	`0`	Starting page number
`page_step`	integer	`1`	Amount to increment the page number between requests (`page` mode)
`offset_param`	string	—	Query parameter name for the offset (`offset` mode)
`offset_start`	integer	`0`	Starting offset value
`offset_step`	integer	—	Amount to increment the offset between requests; required for `offset` mode unless `page_size` is set
`page_size.default`	integer	—	Default page size
`page_size.max`	integer	—	Maximum page size
`page_size.query_param`	string	—	Query parameter name for the page size
`page_size.body_path`	list of strings	`[]`	JSON path in the request body for the page size
`max_pages`	integer	—	Maximum number of pages to fetch
`link_header_require_results`	boolean	`false`	For `link_header` mode, require non-empty results to continue

Example using cursor-based pagination:

pagination:
  mode: cursor_query
  cursor_param: after
  response_cursor_path:
    - pagination_meta
    - after
  page_size:
    default: 25
    max: 100
    query_param: page_size

Example using body-based cursor pagination:

pagination:
  mode: cursor_body
  cursor_body_path:
    - query
    - cursor
  response_cursor_path:
    - data
    - nextCursor
  page_size:
    default: 50

Example using link-header pagination (common with GitHub-style APIs):

pagination:
  mode: link_header
  page_size:
    default: 30
    max: 100
    query_param: per_page

Source inputs

Source specs can declare the variables and secrets this source needs under a top-level inputs map. Inputs are where credentials and other configuration values are set up. They are not where those values are sent. Runtime fields, such as HTTP auth, request_headers, table request headers, query params, body fields, and MCP server.auth, decide where stored inputs are used.

Input map basics

At install time, coral source add collects each input from an environment variable matching the key, or prompts interactively when you pass --interactive.

Variables: non-secret configuration like base URLs or organization IDs
Secrets: API keys or bearer tokens

inputs:
  API_BASE:
    kind: variable
    default: https://api.example.com
    hint: Base URL for the API
  API_TOKEN:
    kind: secret
    hint: API bearer token

base_url: "{{input.API_BASE}}"
auth:
  type: HeaderAuth
  headers:
    - name: Authorization
      from: bearer
      key: API_TOKEN

Input rules

kind: variable values are stored with source variables
kind: secret values are stored with source secrets
default is allowed only for variables
hint is optional and shown alongside the input during coral source add --interactive
references elsewhere in the manifest use {{input.KEY}} templates, from: input, or wrappers such as from: bearer
credential-like inputs such as API keys, tokens, passwords, secrets, private keys, and bearer or authorization values must be kind: secret; Coral rejects those names as variables because variable values are visible through source APIs and coral.inputs
input keys must not start with Coral’s reserved internal prefix __coral
declaring an input does not add it to any request by itself; reference it from auth, headers, query params, body fields, or other runtime request configuration where it should be used

Credential methods

Secret inputs can also declare credential.methods metadata for install-time credential retrieval choices. credential.methods only controls how the secret is collected; runtime request authentication is still defined by auth or request headers that reference the input. Supported credential method types:

source_config: read the secret from an environment variable by default, or from an interactive prompt when coral source add --interactive is used. When credential is omitted, Coral uses this behavior.
oauth: run an OAuth device-code or authorization-code flow and store the returned access token in the same source secret.

Use type: oauth when the provider should issue the source secret through a device-code or browser-based OAuth setup instead of asking the user to paste a token. If you want to support both flows, list the OAuth method first and add a source_config fallback for users who already have a token. Each method accepts three optional display fields used during interactive install (coral source add --interactive) and in the generated source docs:

label (string): the choice shown in the method picker.
description (string): a short one-line blurb for the method.
hint (string): markdown guidance shown next to the fields that method collects — for example how to register the OAuth app, what the callback URL is, or where to find an existing token. Keep each hint focused on the inputs that method needs rather than restating the other methods. Prefer a per-method hint over a single long input-level hint when the input offers several methods, since the fields below the hint change with the selected method.

inputs:
  GITHUB_TOKEN:
    kind: secret
    credential:
      methods:
        - type: oauth
          label: Connect with GitHub
          description: Use OAuth instead of pasting a token.
          hint: |
            Signs you in through GitHub with no client secret. To use your
            own app, set GITHUB_OAUTH_CLIENT_ID to its Client ID.
          oauth:
            flow:
              type: device_code
            endpoints:
              device_authorization_url: https://github.com/login/device/code
              token_url: https://github.com/login/oauth/access_token
            client:
              id:
                input: GITHUB_OAUTH_CLIENT_ID
            scopes:
              scope:
                delimiter: space
                values:
                  - repo
                  - read:org
        - type: source_config
          label: Paste token

Authorization-code OAuth uses a loopback redirect and can require PKCE:

inputs:
  DEMO_API_TOKEN:
    kind: secret
    credential:
      methods:
        - type: oauth
          label: Connect with Demo
          description: Open a browser and authorize Coral to read Demo API data.
          hint: |
            Signs you in through Demo in your browser and requests the
            `read:data` scope. To use your own app, set DEMO_OAUTH_CLIENT_ID
            to its Client ID.
          oauth:
            flow:
              type: authorization_code
              pkce: required
            redirect_uri: http://127.0.0.1:0/oauth/callback
            redirect_uri_port_mode: random
            endpoints:
              authorization_url: https://demo.example.com/oauth/authorize
              token_url: https://demo.example.com/oauth/token
            client:
              id:
                input: DEMO_OAUTH_CLIENT_ID
            scopes:
              scope:
                delimiter: space
                values:
                  - read:data
        - type: source_config
          label: Paste token
          hint: Paste a Demo API token with read access to the data you query.

OAuth flow and client settings

OAuth supports device-code flow and authorization-code flow. For device-code flow, declare flow.type: device_code, endpoints.device_authorization_url, endpoints.token_url, and a public client ID through client.id.default, client.id.input, or both. Device-code OAuth uses a public client ID only: client.secret must not be declared, and no callback URL, loopback listener, client secret, or PKCE setting is required. For authorization-code flow, declare flow.type: authorization_code, an explicit pkce of required or disabled, redirect_uri, endpoints.authorization_url, and endpoints.token_url. Public clients can declare client.id.default, client.id.input, or both; an interactive input value overrides default. Authorization-code OAuth methods whose token endpoint requires client-secret authentication must prompt for both OAuth client values: set client.id.input, client.secret.input, and a client.secret.transport of basic_auth or request_body. Client secrets are sent only to the token endpoint and are never included in the authorization URL. They may be stored as internal credential metadata so Coral can refresh the access token without prompting the user again. OAuth endpoint URLs may include {{input.KEY}} templates for declared kind: variable inputs, which Coral renders during OAuth credential setup. Use this for non-secret endpoint components such as a Microsoft tenant ID or site domain. OAuth endpoint templates do not support secret inputs, filters, function arguments, state, or inline defaults. Scopes are declared under scopes.scope; delimiter: space represents scope=a b, and delimiter: comma represents scope=a,b. Coral stores OAuth metadata alongside the source secret and preserves rotated refresh tokens when the provider returns a new one during refresh.

OAuth redirect URIs

Authorization-code flow currently supports loopback HTTP callback URIs: http://127.0.0.1[:<port>]/<path> or http://localhost[:<port>]/<path>. redirect_uri_port_mode controls how Coral binds the loopback port. When this field is omitted, Coral treats redirect_uri port 0 as random; otherwise it defaults to fixed.

fixed binds the exact non-zero port authored in redirect_uri; if that port is busy, install fails.
random binds a free port at install time. In this mode redirect_uri must omit the port or use port 0; Coral sends the effective URI with the assigned port in both the authorization request and the token exchange.

When users provide their own authorization-code OAuth client ID, they must configure the matching loopback redirect behavior in the OAuth app. Coral starts the loopback flow and returns the provider authorization URL. After the callback, Coral exchanges the code, stores the access token as the source secret, and stores refresh metadata internally. When the provider issues a refresh token, Coral uses it transparently for source validation and queries, so users should not need to reconnect the source just because a short-lived access token expires. If the provider does not issue a refresh token, users may need to reconnect the source after the access token expires. The CLI also accepts the final loopback redirect URL in the terminal while it waits, which lets users complete OAuth when their browser cannot reach the machine running Coral directly.

Inspecting configured inputs

OAuth stores the token response access_token as the source secret value. If a runtime auth header needs a bearer value, use from: bearer with that input key. If the source also accepts a full pasted API-key header, wrap the API-key input and OAuth bearer input in from: one_of. At runtime, installed source inputs are also surfaced through coral.inputs. This is useful when agents or scripts need to inspect non-secret source config such as a Datadog site or Jira base URL and compose absolute URLs or account-scoped identifiers from it. Secret values are never exposed there: secret rows always return value IS NULL, while is_set shows whether the secret has been configured. coral.inputs includes these columns:

Column	Description
`schema_name`	Source schema that owns the input
`key`	Input key from the manifest
`kind`	`variable` or `secret`
`value`	Variable value when available; `NULL` for secrets
`default_value`	Manifest-authored default for variables
`hint`	Prompt hint from the manifest, if any
`required`	Whether the input must be configured
`is_set`	Whether Coral has a saved value for that input

Example queries:

-- Look up a variable value
SELECT value FROM coral.inputs
WHERE schema_name = 'datadog' AND kind = 'variable' AND key = 'DD_SITE';

-- Check which secrets are configured without revealing them
SELECT schema_name, key FROM coral.inputs
WHERE kind = 'secret' AND is_set;

​Top-level fields

​Test queries

​Column types

​Nested field naming

​Querying JSON columns

​Column expressions

​join_array_path

​format_timestamp

​base64_decode

​replace

​template

​Relation names

​File-backed tables

​JSONL and JSON

​CSV

​Parquet

​Shared file table constraints

​File path partitions

​File metadata

​HTTP-backed tables

​Table filters

​HTTP table functions

​MCP sources

​MCP error reasons

​HTTP authentication

​HeaderAuth

​BasicAuth

​CustomAuth

aws_sigv4

​Non-auth request headers

​HTTP rate limiting

​HTTP dependent-join capability metadata

​HTTP value sources

​HTTP response fields

​HTTP pagination

​Source inputs

​Input map basics

​Input rules

​Credential methods

​OAuth flow and client settings

​OAuth redirect URIs

​Inspecting configured inputs

Top-level fields

Test queries

Column types

Nested field naming

Querying JSON columns

Column expressions

`join_array_path`

`format_timestamp`

`base64_decode`

`replace`

`template`

Relation names

File-backed tables

JSONL and JSON

CSV

Parquet

Shared file table constraints

File path partitions

File metadata

HTTP-backed tables

Table filters

HTTP table functions

MCP sources

MCP error reasons

HTTP authentication

`HeaderAuth`

`BasicAuth`

`CustomAuth`

`aws_sigv4`

Non-auth request headers

HTTP rate limiting

HTTP dependent-join capability metadata

HTTP value sources

HTTP response fields

HTTP pagination

Source inputs

Input map basics

Input rules

Credential methods

OAuth flow and client settings

OAuth redirect URIs

Inspecting configured inputs