Top-level fields
Every source spec starts with:| Field | Description |
|---|---|
name | Source name, also used as the SQL schema name |
version | Source spec version string |
dsl_version | Coral source spec DSL version (currently 3) |
backend | How data is fetched: http, file, or mcp |
test_queries | Optional read-only SQL checks Coral runs during coral source test |
Test queries
Use the optional top-leveltest_queries field when you want Coral to run one
or more read-only SQL checks after the source validates and its tables become
queryable.
- Each query result is reported individually during
coral source test. coral source addandcoral onboardsurface post-install validation issues as warnings;coral source testis the strict pass/fail command.- Prefer cheap read-only checks that validate connectivity and mapping logic. For example,
SELECT * FROM schema.table LIMIT 1is usually a better fit thanSELECT COUNT(*). - Keep these queries read-only. Statements such as
CREATE,COPY,INSERT,UPDATE,DELETE, orSETare reported as failed validation queries.
Column types
Thetype field on each column accepts:
| Type | Description |
|---|---|
Utf8 | UTF-8 string |
Int64 | 64-bit signed integer |
Float64 | 64-bit floating point |
Boolean | Boolean true/false |
Timestamp | UTC timestamp (microsecond precision) |
Json | UTF-8 string containing JSON. Hint that the column is queryable with JSON accessors. See “Querying JSON columns” below. |
nullable(boolean, defaulttrue): whether the column can contain NULL valuesdescription(string): human-readable description, surfaced throughcoral.columns
Nested field naming
Coral uses a double underscore (__) naming convention for flattened nested or related fields.
Treat the full name as the SQL column name:
assignee__namemeans thenamefield inside theassigneeobject or relationrepository__owner__loginmeansrepository.owner.login
coral.columns:
assignee.name or repository.owner.login. When you are unsure which flattened names a source exposes, inspect them first:
Querying JSON columns
Json-typed columns (and any Utf8 column whose values happen to be JSON) can be queried with the built-in JSON functions:
json_get, json_get_str, json_get_int, json_get_float, json_get_bool, json_get_array, json_get_json, json_contains, json_length, json_as_text. For detailed function behavior, see the datafusion-functions-json reference.
Use plain object keys or array indexes with these functions. They do not accept JSONPath syntax such as $.browser. For example, json_get_str(properties, 'browser') looks up the browser key. Some sources, including PostHog, use literal keys that start with $, so json_get_str(properties, '$browser') looks up the exact key name $browser.
Column expressions
Theexpr field on a column defines how to extract or compute its value from each row. When omitted, Coral uses a path expression matching the column name.
| Kind | Description |
|---|---|
path | Extract a value by JSON path segments |
join_array_path | Join one nested scalar field from each object in an array |
format_timestamp | Convert an epoch timestamp to a Timestamp column |
base64_decode | Decode a base64 string expression into UTF-8 text |
replace | Replace all occurrences of one substring in the rendered value |
template | Build a string from a template with placeholder substitutions |
join_array_path
Extracts one nested value from each object in an array and joins the values into one string.
Use this when an API returns object arrays such as labels, tags, or owners and the table should expose a compact human-readable column.
| Field | Type | Default | Description |
|---|---|---|---|
path | array | — | JSON path to the array |
item_path | array | — | JSON path to extract from each array item |
separator | string | , | Separator used between non-null extracted items |
format_timestamp
Converts a numeric/string epoch value or an ISO 8601 string into a Timestamp column.
Coral stores Timestamp columns as epoch microseconds internally, so the rendered output has microsecond precision even when the source value is provided in seconds or milliseconds.
format_timestamp currently accepts raw epoch inputs in seconds or milliseconds, plus ISO 8601 / RFC 3339 strings via iso8601. APIs that vend only microseconds or nanoseconds are not supported by this helper yet.
| Field | Type | Default | Description |
|---|---|---|---|
input | string | seconds | Unit/format of the raw value: seconds, milliseconds, or iso8601 |
expr | expr | — | Inner expression that produces the timestamp value |
base64_decode
Decodes an inner expression as base64 and returns UTF-8 text. Whitespace in the encoded value is ignored, which is useful for APIs that wrap base64 file contents across lines. Invalid base64 or non-UTF-8 bytes produce NULL.
| Field | Type | Default | Description |
|---|---|---|---|
expr | expr | — | Inner expression that produces base64 |
replace
Evaluates an inner expression as a string and replaces all occurrences of from with to.
| Field | Type | Default | Description |
|---|---|---|---|
expr | expr | — | Inner expression that produces a string |
from | string | — | Non-empty substring to replace |
to | string | — | Replacement text |
template
Builds a string by substituting parsed template tokens with either:
{{expr.name}}: a named sub-expression fromvalues{{filter.name}}: a query filter value
|default supplies a fallback when the value is missing at runtime.
Column-expression templates do not support secret, variable, or state namespaces.
| Field | Type | Default | Description |
|---|---|---|---|
template | string | — | Parsed Coral template string using {{expr.*}} and {{filter.*}} |
values | object | — | Named sub-expressions referenced from {{expr.name}} tokens |
Relation names
Tables and source-scoped table functions share one SQL relation namespace within a source. Name comparisons are case-insensitive, soIssues, issues, and a
table function named issues conflict.
Table names must be non-empty. Prefer plain snake_case table names, but names
that require SQL quoting, such as player.stats, remain valid for compatibility
and can be queried as quoted identifiers. Table-function names must be ASCII
identifiers: start with a letter or underscore, then use only letters, numbers,
or underscores.
File-backed tables
Forbackend: file, each table declares its own format and points at a
location:
- Supported formats are
parquet,jsonl,json, andcsv - Supported transports are
file://ands3://for every file format - If you omit
glob, Coral uses a format-specific default
JSONL and JSON
jsonl reads newline-delimited JSON objects. json reads JSON arrays of
objects.
Tables using either format must declare columns. Declare nested object or array
fields as type: Json; Coral exposes those values as JSON text so they can be
queried with json_get_* functions such as json_get_str(payload, 'id').
CSV
CSV tables must declare columns. They supportformat_options.has_header and
format_options.delimiter.
Parquet
Parquet tables can infer columns whencolumns: [] is used.
Shared file table constraints
File-backed tables do not support declaredfilters, virtual columns, or
columns[*].expr; use SQL projections and predicates instead.
For s3:// locations, declare the S3 object-store policy under the table
source instead of relying on magic input names:
auth: { type: instance_profile } when the runtime should use the host’s
AWS instance profile.
File path partitions
File-backed tables can expose partition columns derived from object paths. Whenpath is omitted, partitions use Hive-style path segments:
Partition columns support Utf8, Int64, Boolean, Float64, and Json;
Timestamp partitions are not supported.
2026/05/14/session.jsonl to partition columns
year = 2026, month = 5, and day = 14. SQL predicates on partition columns
can prune unrelated files before JSON rows are decoded.
For JSON and JSONL tables, declaring partitions makes the path layout part of
the table contract. Files that do not provide every declared partition value in
the configured layout fail the query; matching paths with invalid typed
partition values also fail the query.
Parquet and CSV tables use DataFusion’s native listing table reader and support
Hive-style partitioning. Positional segment partitioning is currently limited to
JSON and JSONL tables because those formats already use Coral’s custom JSON row
mapper.
HTTP-backed tables
Forhttp, the source spec declares request and response behavior.
response: {}means “use the default response rules” for an HTTP table- with the default response rules, Coral treats the selected response value directly as rows unless you configure a different
row_strategy inputsdeclares values Coral collects and stores when the source is addedauthdeclares how stored values are used to authenticate outgoing HTTP requests{{input.API_TOKEN}}is resolved from the secret store becauseAPI_TOKENis declared askind: secret
HTTP table functions
HTTP sources can declare source-scoped table functions for provider-native operations that return rows and need invocation arguments. A function adds invocation arguments, but it still owns the same execution shape as an HTTP table: request, response mapping, pagination, and result columns. It does not need a backing table. Table functions are not only for search. Use the defaultkind: table for
parameterized non-retrieval operations, such as scoped child collections,
time-range log queries, metrics queries, or detail operations that do not map
cleanly to a stable table. Use kind: search only for provider-ranked
retrieval surfaces.
bind.arg when a SQL argument should populate a differently named request
argument. Function request values can then reference those bound arguments with
from: arg.
kind: search marks a provider-ranked retrieval surface. Search functions
return provider-ranked candidate rows, not exhaustive SQL-filtered tables.
Other table functions keep the default kind: table. Use named SQL arguments
when calling table functions:
mode: contains remains available for ordinary table filters whose provider
API supports substring matching. It is not a retrieval marker and should not be
used as a replacement for kind: search:
kind: search
functions. Table filters are for exact lookup, required scoping, or ordinary
provider-side filtering on list/detail tables.
search_limits values must be positive. default_top_k and max_top_k are
capped at 1000, max_calls_per_query is capped at 100, and
max_top_k * max_calls_per_query cannot exceed 10000 candidates.
MCP sources
MCP sources expose Model Context Protocol tools as SQL tables and source-scoped table functions. The manifest remains the contract for SQL names, filters, arguments, response shape, and columns; Coral does not introspect MCP tools at query time. Stdio MCP servers run as child processes:server.auth. OAuth credential metadata is
authored the same way as HTTP sources: under the top-level secret input’s
credential.methods. The server.auth block only decides how the stored secret
is attached to MCP HTTP requests.
tools/call, and Streamable HTTP sources run the initialize handshake on
every call. A single SQL query that scans multiple MCP tables incurs that
handshake once per table scan. Session pooling is a future optimization.
MCP error reasons
Query-time failures surface as one of the following structured error reasons. Every reason carriessource and mcp_stage metadata. Tool-scoped reasons
(MCP_TOOL_CALL_FAILED, MCP_TOOL_RETURNED_ERROR, MCP_RESULT_DECODE_FAILED,
MCP_PAGINATION_FAILED) additionally carry relation and tool. Transport,
auth, and server-start reasons do not, since they fire before a specific tool
call is identified.
| Reason | Meaning |
|---|---|
MCP_SERVER_START_FAILED | The stdio MCP server process could not be spawned. |
MCP_INITIALIZE_FAILED | The MCP initialize handshake failed. Retryable. |
MCP_AUTH_REQUIRED | A Streamable HTTP server returned 401 with WWW-Authenticate. Reinstall the source with a valid credential. |
MCP_AUTH_FAILED | A Streamable HTTP server returned 403 with insufficient scope. Update the manifest scopes and reinstall. |
MCP_HTTP_REQUEST_FAILED | The HTTP request to a Streamable HTTP MCP server failed before a response was received (connect, DNS, TLS). Retryable. |
MCP_HTTP_STATUS_FAILED | A Streamable HTTP MCP server returned a non-success status that is not an authentication failure. Retryable. |
MCP_HTTP_SSE_DECODE_FAILED | A Streamable HTTP MCP server returned an undecodable SSE stream or an unexpected content type. |
MCP_SESSION_EXPIRED | A Streamable HTTP MCP session was rejected (HTTP 404) and the transport could not transparently reinitialize. Retryable. |
MCP_TOOL_CALL_FAILED | The transport or protocol layer rejected the tools/call. Retryable. |
MCP_TOOL_RETURNED_ERROR | The tool ran and reported a business-logic failure on the manifest’s declared response.error_path. |
MCP_RESULT_DECODE_FAILED | The tool’s structured content did not match the declared response.rows_path or column types. |
MCP_PAGINATION_FAILED | Cursor pagination exceeded max_pages without terminating. |
HTTP authentication
The top-levelauth block declares how Coral authenticates outgoing requests. It does not collect or store credentials; reference values declared under Source inputs when auth needs API keys, bearer tokens, usernames, passwords, or signing material. Pick one of three type values:
type | Purpose |
|---|---|
HeaderAuth | Attach one or more declarative auth headers (Bearer tokens, API keys, pre-signed headers). |
BasicAuth | HTTP Basic. Coral base64-encodes username:password into Authorization: Basic .... |
CustomAuth | Use a named authenticator for auth schemes that need request signing, such as AWS SigV4. |
HeaderAuth
headers entries share the same value-source shape as request/table headers,
but auth is source-scoped: from: literal, from: template, from: input,
from: bearer, and from: one_of are meaningful here. Filter, argument, and
state values have no value at auth time. Use from: bearer when Coral stores a
raw token but the API expects an Authorization: Bearer <token> header.
Use from: one_of when a source supports multiple credential shapes. Coral
uses the first value that is present and non-empty:
BasicAuth
username and password are templates; {{input.KEY}} tokens are honored. Coral encodes them as Authorization: Basic base64(user:pass).
CustomAuth
authenticator key names a runtime-provided authenticator. All other fields under auth are passed through as that authenticator’s config. Use this type when signing depends on the final request contents.
aws_sigv4
Signs outgoing requests with AWS Signature Version 4. Useful for querying AWS monitoring services via CloudWatch, CloudTrail, or any other SigV4-protected API.
Example against CloudWatch Logs:
| Field | Required | Notes |
|---|---|---|
service | yes | AWS service code (logs, monitoring, cloudtrail, execute-api, …) |
region | yes | AWS region; templated from inputs |
access_key_id | yes | AWS access key ID |
secret_access_key | yes | AWS secret access key |
session_token | no | Omit for long-term credentials; include for STS/assumed-role sessions |
AWS_SESSION_TOKEN in the source’s inputs block and include it in the authenticator config:
service: s3 and service: s3-outposts switch to S3-specific signing (single percent-encoding, path normalization disabled, X-Amz-Content-Sha256 enabled). This is required to avoid SignatureDoesNotMatch on bucket operations and is handled automatically.
Non-auth request headers
Headers required on every request but not used for auth (Accept, API version, etc.) go in a top-levelrequest_headers: block (sibling of auth:), not inside auth.headers:
request_headers resolves before auth on each request. If an auth header has the same name, it overwrites the value from request_headers.
Coral sets User-Agent: coral on every outgoing request automatically — do not declare it in manifests.
HTTP rate limiting
The optionalrate_limit object gives Coral provider-specific hints for classifying rate-limit responses on HTTP sources.
Coral always treats 429 Too Many Requests as a rate-limit signal. Use extra_statuses when a provider also reports rate limits through another status code such as GitHub’s 403 Forbidden. Extra statuses are only treated as rate limits when the response also carries one of the configured rate-limit headers; otherwise Coral keeps treating them as ordinary provider errors.
| Field | Type | Default | Description |
|---|---|---|---|
extra_statuses | list of integers | [] | Additional HTTP status codes Coral should consider as possible rate limits |
retry_after_header | string | Retry-After | Header that carries a retry delay in seconds or HTTP-date form |
remaining_header | string | — | Header whose value 0 means the current quota is exhausted |
reset_header | string | — | Header with the quota reset time as Unix epoch seconds |
HTTP value sources
Request query params, body fields, and headers support thesefrom values:
from value | Description |
|---|---|
literal | Use the manifest-authored JSON value directly |
template | Render a string template with input, filter, or state tokens |
filter | Read a table filter captured from the query WHERE clause and serialize it as a string |
filter_int | Read a table filter captured from the query WHERE clause and serialize it as a JSON integer |
filter_bool | Read a table filter captured from the query WHERE clause and serialize it as a JSON boolean |
filter_split | Split a table filter string and serialize one part as a string |
filter_split_int | Split a table filter string and serialize one part as a JSON integer |
arg | Read a source function argument captured from the function call and serialize it as a string |
arg_int | Read a source function argument captured from the function call and serialize it as a JSON integer |
arg_bool | Read a source function argument captured from the function call and serialize it as a JSON boolean |
arg_split | Split a source function argument string and serialize one part as a string |
arg_split_int | Split a source function argument string and serialize one part as a JSON integer |
input | Read a manifest-declared source input (variable/secret) by key |
bearer | Read a manifest-declared secret input by key and serialize it as Bearer <value> |
one_of | Resolve nested value sources in order and use the first present, non-empty value |
state | Read pagination/runtime state |
now_epoch_minus_seconds | Emit the current Unix epoch seconds minus the configured offset |
filter* value sources differ from arg* value sources by where the value
comes from. Table requests use filter value sources because their request values
come from SQL predicates such as WHERE id = 123. Source function requests use
arg value sources because their request values come from named function
arguments such as github.search_issues(q => 'flaky').
The suffix controls the value type sent to the provider. Use the bare form for a
string request value, _int for a JSON integer, and _bool for a JSON boolean.
Boolean filters used with filter_bool can be written as normal SQL boolean
predicates, for example WHERE include_archived IS FALSE or
WHERE include_archived = false.
Use filter_split and filter_split_int when a provider requires structured
request fields but users naturally have one compound identifier. For example,
SOURCE-496 can become a string team key and integer issue number:
arg_split and arg_split_int the same way for source-scoped table
functions:
HTTP response fields
Theresponse object controls how Coral extracts rows from an HTTP response body.
| Field | Type | Default | Description |
|---|---|---|---|
rows_path | list of strings | [] | JSON path segments to the array of rows in the response |
ok_path | list of strings | [] | JSON path to a field indicating success |
error_path | list of strings | [] | JSON path to a field containing an error message |
allow_404_empty | boolean | false | Treat HTTP 404 as an empty result set instead of an error |
row_strategy | string | direct | How to convert the selected value into rows. direct uses the array or object as-is. dict_entries maps a JSON object’s key-value pairs into rows. series_point_list flattens specialized timeseries arrays (e.g., Datadog metrics). |
rows_path:
response_body.data.items instead of treating the entire response as the row array.
HTTP pagination
Thepagination object controls how Coral fetches additional pages of results.
| Field | Type | Default | Description |
|---|---|---|---|
mode | string | none | Pagination strategy: none, auto, cursor_query, cursor_body, page, offset, or link_header |
cursor_param | string | — | Query parameter name for the cursor (cursor_query mode) |
response_cursor_path | list of strings | [] | JSON path in the response to the next cursor value |
cursor_body_path | list of strings | [] | JSON path in the request body for the cursor (cursor_body mode) |
page_param | string | — | Query parameter name for the page number (page mode) |
page_start | integer | 0 | Starting page number |
offset_param | string | — | Query parameter name for the offset (offset mode) |
offset_start | integer | 0 | Starting offset value |
page_size.default | integer | — | Default page size |
page_size.max | integer | — | Maximum page size |
page_size.query_param | string | — | Query parameter name for the page size |
max_pages | integer | — | Maximum number of pages to fetch |
link_header_require_results | boolean | false | For link_header mode, require non-empty results to continue |
Source inputs
Source specs can declare the variables and secrets this source needs under a top-levelinputs map. Inputs are where credentials and other configuration values are set up. They are not where those values are sent. Runtime fields, such as HTTP auth, request_headers, table request headers, query params, body fields, and MCP server.auth, decide where stored inputs are used.
Input map basics
At install time,coral source add collects each input from an environment variable matching the key, or prompts interactively when you pass --interactive.
- Variables: non-secret configuration like base URLs or organization IDs
- Secrets: API keys or bearer tokens
Input rules
kind: variablevalues are stored with source variableskind: secretvalues are stored with source secretsdefaultis allowed only for variableshintis optional and shown alongside the input duringcoral source add --interactive- references elsewhere in the manifest use
{{input.KEY}}templates,from: input, or wrappers such asfrom: bearer - credential-like inputs such as API keys, tokens, passwords, secrets, private keys, and bearer or authorization values must be
kind: secret; Coral rejects those names as variables because variable values are visible through source APIs andcoral.inputs - input keys must not start with Coral’s reserved internal prefix
__coral - declaring an input does not add it to any request by itself; reference it from
auth, headers, query params, body fields, or other runtime request configuration where it should be used
Credential methods
Secret inputs can also declarecredential.methods metadata for install-time
credential retrieval choices. credential.methods only controls how the secret
is collected; runtime request authentication is still defined by auth or
request headers that reference the input.
Supported credential method types:
source_config: read the secret from an environment variable by default, or from an interactive prompt whencoral source add --interactiveis used. Whencredentialis omitted, Coral uses this behavior.oauth: run an OAuth device-code or authorization-code flow and store the returned access token in the same source secret.
type: oauth when the provider should issue the source secret through a
device-code or browser-based OAuth setup instead of asking the user to paste a
token. If you want to support both flows, list the OAuth method first and add a
source_config fallback for users who already have a token.
Each method accepts three optional display fields used during interactive
install (coral source add --interactive) and in the generated source docs:
label(string): the choice shown in the method picker.description(string): a short one-line blurb for the method.hint(string): markdown guidance shown next to the fields that method collects — for example how to register the OAuth app, what the callback URL is, or where to find an existing token. Keep each hint focused on the inputs that method needs rather than restating the other methods. Prefer a per-methodhintover a single long input-levelhintwhen the input offers several methods, since the fields below the hint change with the selected method.
OAuth flow and client settings
OAuth supports device-code flow and authorization-code flow. For device-code flow, declareflow.type: device_code,
endpoints.device_authorization_url, endpoints.token_url, and a public
client ID through client.id.default, client.id.input, or both. Device-code
OAuth uses a public client ID only: client.secret must not be declared, and no
callback URL, loopback listener, client secret, or PKCE setting is required.
For authorization-code flow, declare flow.type: authorization_code, an
explicit pkce of required or disabled, redirect_uri,
endpoints.authorization_url, and endpoints.token_url. Public clients can
declare client.id.default, client.id.input, or both; an interactive input
value overrides default. Authorization-code OAuth methods whose token
endpoint requires client-secret authentication must prompt for both OAuth
client values: set client.id.input, client.secret.input, and a
client.secret.transport of basic_auth or request_body. Client secrets are
sent only to the token endpoint and are never included in the authorization URL.
They may be stored as internal credential metadata so Coral can refresh the
access token without prompting the user again.
OAuth endpoint URLs may include {{input.KEY}} templates for declared
kind: variable inputs, which Coral renders during OAuth credential setup. Use
this for non-secret endpoint components such as a Microsoft tenant ID or site
domain. OAuth endpoint templates do not support secret inputs, filters,
function arguments, state, or inline defaults.
Scopes are declared under scopes.scope; delimiter: space represents
scope=a b, and delimiter: comma represents scope=a,b. Coral stores OAuth
metadata alongside the source secret and preserves rotated refresh tokens when
the provider returns a new one during refresh.
OAuth redirect URIs
Authorization-code flow currently supports loopback HTTP callback URIs:http://127.0.0.1[:<port>]/<path> or http://localhost[:<port>]/<path>.
redirect_uri_port_mode controls how Coral binds the loopback port. When this
field is omitted, Coral treats redirect_uri port 0 as random; otherwise it
defaults to fixed.
fixedbinds the exact non-zero port authored inredirect_uri; if that port is busy, install fails.randombinds a free port at install time. In this moderedirect_urimust omit the port or use port0; Coral sends the effective URI with the assigned port in both the authorization request and the token exchange.
Inspecting configured inputs
OAuth stores the token responseaccess_token as the source secret value. If a
runtime auth header needs a bearer value, use from: bearer with that input
key. If the source also accepts a full pasted API-key header, wrap the API-key
input and OAuth bearer input in from: one_of.
At runtime, installed source inputs are also surfaced through coral.inputs.
This is useful when agents or scripts need to inspect non-secret source config
such as a Datadog site or Jira base URL and compose absolute URLs or
account-scoped identifiers from it. Secret values are never exposed there:
secret rows always return value IS NULL, while is_set shows whether the
secret has been configured.
coral.inputs includes these columns:
| Column | Description |
|---|---|
schema_name | Source schema that owns the input |
key | Input key from the manifest |
kind | variable or secret |
value | Variable value when available; NULL for secrets |
default_value | Manifest-authored default for variables |
hint | Prompt hint from the manifest, if any |
required | Whether the input must be configured |
is_set | Whether Coral has a saved value for that input |