Skip to main content
Coral ships with popular data sources, but you are not limited to them. Before writing a new spec, check whether a community source already exists. If the data source you need is not available out of the box or in the community catalog, you can write a source spec in YAML and import it. A source spec tells Coral how to connect to an API or read a local dataset, what tables to expose, and what columns each table has. Once added, the source works exactly like a bundled one: same SQL, same coral.tables, coral.columns and coral.inputs, same MCP surface.
Install the Coral source spec authoring skill with npx skills add withcoral/skills to help your coding agent write source specs.

Will this work for my source?

How much effort a custom source takes depends on what you are connecting to. File-based sources (Parquet, JSONL, JSON, CSV) are the simplest case — define the file location and columns, and the source is ready to query. API-backed sources vary in difficulty. The source spec model is designed for APIs that have:
  • HTTP credentials Coral can store locally — an API key, personal access token, service account credential, or OAuth device-code or authorization-code flow that returns an access token for Coral to use on API requests
  • Structured, accessible documentation — an OpenAPI spec, GraphQL schema, or clearly organized REST docs that describe endpoints, response shapes, authentication, scopes, and OAuth client settings
  • Real data to validate against — actual records in the account so you can confirm the source is returning the expected results during development
OAuth-backed services can work well when the provider supports device-code flow, or when it supports an authorization-code loopback redirect URI such as http://127.0.0.1:<port>/oauth/callback or http://localhost:<port>/oauth/callback. Coral can run the OAuth flow during coral source add --interactive, store the returned access token as the source secret, and use that secret through the normal auth or request-header fields. When the provider returns a refresh token, Coral stores the refresh metadata and refreshes short-lived access tokens at runtime. Without a refresh token, the access token remains usable until it expires, then users must reconnect the source. If you are unsure whether your API is a good candidate, ask us in Discord before getting started.

Agent-driven authoring

Source authoring works well as an agent-driven workflow. Point your coding agent at the Coral CLI and the source spec reference, give it the API docs or describe the endpoints you want, and let it iterate:
  1. The agent writes or updates the source spec YAML
  2. Lints the file with coral source lint ./my-source.yaml to catch errors before installing
  3. Adds it with coral source add --file ./my-source.yaml
  4. Validates with coral source test my_source
  5. Inspects coral.tables, coral.table_functions, coral.columns, coral.filters and coral.inputs to check the shape
  6. Refines and repeats until the tables look right

Search and retrieval endpoints

Model provider-native search as a source-scoped table function, not as a normal table filter. If an API endpoint accepts a query string and returns provider-ranked candidates, declare it under functions with kind: search, search_limits, explicit arguments, and result columns. Search functions are called from SQL with named arguments:
SELECT id, title, url
FROM github.search_issues(q => 'repo:withcoral/coral source functions')
LIMIT 10
Search functions are retrieval surfaces, not exhaustive tables. They preserve provider ranking and may return only the best candidates for a query. Use result columns such as id, url, title, score, rank, or provider timestamps so users and agents can decide which candidate to inspect next. If the search row is intentionally thin, expose enough stable identifiers for follow-up queries against ordinary detail tables. Table functions are more general than search. Use the default table function kind for parameterized non-retrieval operations, such as scoped child collections, time-range logs, metrics queries, or detail operations that do not map cleanly to a stable table. Use ordinary table filters for exact lookup, required scoping, or provider-side filtering on list endpoints. Use mode: contains only when the provider has normal substring matching for a table filter. Provider-ranked retrieval belongs in a kind: search function with search_limits; table filters are not retrieval surfaces.

Column naming for nested fields

When a source exposes nested API fields as separate SQL columns, use double underscores (__) to flatten the path into a single column name.
  • assignee__name means “the name field under assignee
  • repository__owner__login means repository.owner.login
This matches the naming convention used across bundled sources and keeps the mapping easy to read in both SQL and coral.columns. Treat the full flattened name as the column name in queries; it is not dot notation.
SELECT title, assignee__name
FROM github.issues
WHERE owner = 'withcoral' AND repo = 'coral'
LIMIT 5
If you are consuming a source, inspect coral.columns to see the exact names Coral exposes. If you are authoring a source, prefer parent__child style names for flattened fields so users can recognize related data consistently.

Example: local JSONL source

The fastest way to see a custom source working is with a local file.

1. Create sample data

mkdir -p demo-data
cat > demo-data/messages.jsonl <<'EOF'
{"type":"user","session_id":"s1","text":"hello"}
{"type":"assistant","session_id":"s1","text":"world"}
EOF

2. Write the source spec

Create local-messages.yaml:
name: local_messages
version: 0.1.0
dsl_version: 3
backend: file
test_queries:
  - SELECT * FROM local_messages.messages LIMIT 1
tables:
  - name: messages
    description: Demo messages
    format: jsonl
    source:
      location: file:///absolute/path/to/demo-data/
      glob: "**/*.jsonl"
    columns:
      - name: type
        type: Utf8
      - name: session_id
        type: Utf8
      - name: text
        type: Utf8
Replace /absolute/path/to/demo-data/ with the real absolute path on your machine. The optional top-level test_queries field lets you declare read-only SQL checks that Coral runs during coral source test. Prefer cheap SELECT queries that confirm the source connects correctly and the table mapping looks right.

3. Add and validate

coral source add --file ./local-messages.yaml
coral source test local_messages
coral source add --file validates immediately after install. Post-install validation issues, including failing test_queries, are printed as warnings so the source still lands in your workspace. Use coral source test <NAME> when you want strict pass/fail verification.

4. Query it

coral sql "
  SELECT type, session_id, text
  FROM local_messages.messages
  ORDER BY session_id, type
"

How to validate

When you add test_queries to a source spec, treat them as a lightweight validation contract for that source.
  1. Run coral source lint ./my-source.yaml first. It catches schema and semantic issues without touching your workspace or asking for secrets.
  2. Add or update the source with coral source add --file ./my-source.yaml.
  3. Confirm the install-time validation output:
    • the source exposes the expected tables
    • each declared test_queries entry runs successfully
    • successful queries show a pass status and row count
    • failing queries surface the SQL text plus an error message
  4. Run real coral sql queries against the new tables to confirm the shape is useful beyond the validation checks.
  5. If you want a strict rerun later, or you are debugging a failure, run coral source test <source_name>.
  6. If a query fails, fix either the source spec or the upstream credentials/configuration, then rerun coral source test <source_name>.
If you are validating a credentialed source end-to-end, it is useful to try both:
  • a correctly configured source, which should pass table validation and all declared query tests
  • an intentionally misconfigured source, which should still install but fail coral source test with a non-zero exit code
This gives you confidence that the source spec works in the happy path and that test_queries catch broken credentials or runtime regressions in a predictable way. Use read-only SQL in test_queries. Prefer bounded checks such as SELECT * FROM schema.table LIMIT 1 over expensive scans like SELECT COUNT(*), especially for HTTP-backed sources or large datasets. Coral rejects DDL, DML, and other non-read-only statements as failed validation queries instead of applying them.

Example: HTTP API source

To connect an HTTP API, the source spec declares the base URL, auth, endpoints, and response shape.

Basic token auth

Authentication has two separate parts. The inputs block defines the values Coral collects and stores when you add the source. The auth block defines how those stored values are used on outgoing HTTP requests. The example below collects API_TOKEN as a secret, then uses it to build the Authorization header with HeaderAuth. For the full auth field reference and additional examples, see HTTP-backed tables.
name: demo_api
version: 0.1.0
dsl_version: 3
backend: http
inputs:
  API_BASE:
    kind: variable
    default: https://api.example.com
    hint: Base URL for the API
  API_TOKEN:
    kind: secret
    hint: Bearer token for the API
base_url: "{{input.API_BASE}}"
auth:
  type: HeaderAuth
  headers:
    - name: Authorization
      from: bearer
      key: API_TOKEN
tables:
  - name: messages
    description: Messages from the API
    request:
      method: GET
      path: /messages
    response:
      rows_path:
        - data
    pagination:
      mode: link_header
      page_size:
        default: 50
        max: 100
        query_param: per_page
    columns:
      - name: id
        type: Utf8
      - name: text
        type: Utf8
      - name: created_at
        type: Utf8

OAuth credential setup

When a source should offer OAuth setup, keep the runtime auth shape the same and add credential.methods to the secret input. The OAuth method controls how Coral obtains the secret during setup; the auth block still decides how that stored secret is sent to the API. Give each method its own hint rather than one input-level hint: the field shown below the hint changes with the selected method, so scope each method’s guidance to the inputs that method collects.
inputs:
  DEMO_API_TOKEN:
    kind: secret
    credential:
      methods:
        - type: oauth
          label: Connect with Demo
          description: Open a browser and authorize Coral to read Demo API data.
          hint: |
            Signs you in through Demo in your browser and requests the
            `read:data` scope. To use your own app, set DEMO_OAUTH_CLIENT_ID
            to its Client ID.
          oauth:
            flow:
              type: authorization_code
              pkce: required
            redirect_uri: http://127.0.0.1:0/oauth/callback
            redirect_uri_port_mode: random
            endpoints:
              authorization_url: https://demo.example.com/oauth/authorize
              token_url: https://demo.example.com/oauth/token
            client:
              id:
                input: DEMO_OAUTH_CLIENT_ID
            scopes:
              scope:
                delimiter: space
                values:
                  - read:data
        - type: source_config
          label: Paste token
          hint: Paste a Demo API token with read access to the data you query.
auth:
  type: HeaderAuth
  headers:
    - name: Authorization
      from: bearer
      key: DEMO_API_TOKEN
When an API supports either a complete pasted API-key header or an OAuth access token, declare both credential inputs as optional secrets, then use from: one_of and put the pasted header first, followed by a from: bearer fallback for the raw OAuth token.
inputs:
  DEMO_API_KEY:
    kind: secret
    required: false
  DEMO_OAUTH_ACCESS_TOKEN:
    kind: secret
    required: false

OAuth client and redirect options

Choose OAuth client settings from the provider’s OAuth app and token endpoint requirements:
  • Use pkce: required for public clients when the provider supports it. Use pkce: disabled only for providers that do not support PKCE.
  • When the token endpoint requires client authentication with a client secret, prompt for both OAuth client values: set client.id.input, client.secret.input, and client.secret.transport (basic_auth or request_body) to match the token endpoint.
  • When the provider requires pre-registered redirect URIs with exact ports, use redirect_uri_port_mode: fixed with a non-zero port and tell users to register that exact URI in the OAuth app.
  • When OAuth endpoint URLs need non-secret install-time configuration, such as a Microsoft tenant ID, declare that value as a kind: variable input and reference it with {{input.KEY}} in endpoints.authorization_url, endpoints.token_url, or endpoints.device_authorization_url. OAuth endpoint templates do not support secrets, filters, function arguments, state, or inline defaults.

Install-time input collection

When you run coral source add --file, Coral reads each declared input from an environment variable of the same name, or prompts for it when you pass --interactive. kind: variable values go to source variables; kind: secret values go to the secret store. API keys, bearer tokens, OAuth access tokens, passwords, private keys, and authorization values must be secrets, even when the credential is read-only. Declaring an input only configures storage; it does not send the value to the API. Runtime request fields such as auth.headers, BasicAuth, CustomAuth, and request headers decide where stored inputs are used. For secret inputs, credential.methods controls how the secret is collected: type: source_config means environment-variable lookup by default or an interactive prompt with --interactive; OAuth device-code and authorization-code methods describe alternative install-time retrieval choices. For OAuth credential methods, Coral stores provider refresh metadata when it is returned and uses it to keep short-lived access tokens usable without asking users to reconnect. Coral can only refresh when the token response includes a refresh token, so choose OAuth scopes accordingly and document any provider or client settings that are required to issue refresh tokens. For the full set of HTTP response, pagination, auth, and column fields, see HTTP-backed tables.

Tips

  • Lint before installing. coral source lint ./my-source.yaml is a fast local precheck because it does not install the source or require credentials.
  • Start small. Begin with one table and a few columns. Validate with coral source test, check coral.columns and coral.inputs, then expand.
  • Use coral.tables, coral.table_functions, coral.columns, coral.filters, and coral.inputs. They show you exactly what Coral sees after adding a source — including required filters, search functions, and column metadata.
  • Look at bundled sources for patterns. The bundled source specs in the repo under sources/core/ are working examples of HTTP pagination, response parsing, auth, and inputs.