VxData API surface

Conventions:

Flat schemas everywhere: vxdata.schemas.{create,update,response} (AnyCreate/AnyUpdate/AnyResponse, discriminated on payload_type).
Resource mutations are batch-native: each verb takes one-or-many; a single item is just a 1-element collection. No per-id REST paths — one POST /resources/{verb}.
Writes never return objects (only counts). Reads return resources.
Soft-delete only (tombstone via is_deleted). Reads hide tombstones; restore = update with is_deleted=false on an id you already know.
as_of (bitemporal read) is accepted by every read; the SDK injects its client-level default unless overridden per request.
Errors via one app-level handler: ValueError/LookupError -> 400, missing resource -> 404.

Resources (CRUD)

POST /resources/create
  body: { resources: AnyCreate[] }
  -> { created: int }                 # all-or-nothing; raises if any id already active

POST /resources/read
  body: { identifiers: str[], as_of?: datetime, exclude_children?: bool=false }
  -> AnyResponse[]                     # input order preserved; missing/tombstoned skipped

POST /resources/update
  body: { updates: { <identifier>: AnyUpdate } }
  -> { updated: int }                  # all-or-nothing; is_deleted toggles delete(true)/restore(false)

POST /resources/delete
  body: { identifiers: str[], cascade?: bool=true }
  -> { deleted: int }                  # soft tombstone (+ subtree when cascade)

POST /resources/restore
  body: { identifiers: str[] }
  -> { restored: int }                 # clears tombstone (inverse of delete)

AnyCreate carries resource metadata (identifier, parent_identifier, license, access_level, derived_from) + flat payload fields. AnyUpdate is the same minus identifier, all optional, plus is_deleted; only provided fields change (derived_from replaces, not appends).

Query

POST /query
  body: QueryRequest { payload_type, filters?, parent_identifier?, include_indirect_parents?,
                       identifiers?, select_columns?, sort_by="created_at", sort_order="desc",
                       cursor?, limit?, as_of? }
  -> { results: AnyResponse[], next_cursor: str|null, limit: int|null }

POST /query/count
  body: QueryRequest                   # pagination fields ignored
  -> { count: int }

POST /query/parquet
  body: QueryRequest
  -> application/octet-stream          # single parquet file

Lineage

GET /parents/{identifier}?as_of=
  -> str[]                             # ancestor identifier chain, nearest-first

Patients / Studies (server-side identifier allocation)

POST /patients/register   body: { datasource_id, external_uid }     -> { identifier }   # raises if exists
POST /patients/resolve    body: { datasource_id, external_uid }     -> { identifier }   # raises if missing
POST /studies/register    body: { patient_identifier, external_uid } -> { identifier }
POST /studies/resolve     body: { patient_identifier, external_uid } -> { identifier }

Artefacts

Artefacts are resources: create/read/update/delete them through /resources/* like any other payload_type.

Stats

GET /datasources/stats                          -> per-datasource counts
GET /datasources/detailed-stats?datasource_id=  -> detailed breakdown (all datasources if omitted)

Schemas

GET /schemas
  -> { <PayloadType>: <json-schema> }  # all payload definitions (subsumes the old per-name + payload-types endpoints)

S3 / object storage

GET  /s3/config
  -> { s3_endpoint, s3_bucket }

POST /s3/presign
  body: { direction: "upload"|"download", path?, filename?, group? }
  -> { url, path? }                    # upload needs filename+group; download needs path

POST /s3/presign/batch
  body: { direction, items: [...] }    # batch of the above
  -> { urls: [...] }

POST /s3/presign/upload-dir
  body: { group, relpaths: str[] }     # one shared {group}/{subdir} prefix, a presigned PUT per relpath
  -> { root, items: [...] }            # root s3:// prefix + { url, path } per input, in order

GET  /s3/list?prefix=
  -> { keys: str[] }                   # object keys under an s3://bucket/key prefix

GET  /s3/parquet?path=&limit=100&offset=0
  -> { columns, schema, total_rows, returned_rows, offset, data }   # parquet preview reader

Resources (CRUD)​

Query​

Lineage​

Patients / Studies (server-side identifier allocation)​

Artefacts​

Stats​

Schemas​

S3 / object storage​