VxData Refactor June 2026 - Migration Guide

This document outlines the changes deployed as part of the June 2026 refactor of the vxData API & SDK. We guide through necessary steps to migrate existing code to remain compatible.

Almost unchanged: Querying

With the refactor, querying remains almost the same:

from vxdata.sdk import F
# instead of client.query("Volumes") we now write client.volumes.query() for better typing!
df_volumes = client.volumes.query().filter(F.b_value > 0).collect()

# to obtain a list of *Response models instead of a dataframe
volumes = client.volumes.query().filter(F.b_value > 0).collect_as_pydantic()

New: better reading and writing patterns

We have refactored CRUD operations and added better typing:

# client.<namespace>.retrieve will raise an error if any passed resource is not found
client.patients.retrieve("patient/BB12345") -> PatientResponse
client.patients.retrieve(["patient/BB12345", "patient/BB12346"]) -> list[PatientResponse]

# client.<namespace>.update is required to selectively update one or more fields
client.patients.update("patient/BB12345", PatientUpdate(year_of_birth=1940))
client.patients.update({"patient/BB12345": ..., ...})

client.volumes.create(VolumeCreateRequest(...))  # see below
...

New: flat payload structures

Previously, ResourceCreateRequests nested Payloads which was unnecessarily complex both mentally and typing-wise.

With vxData 1.0, we flatten schemas and provide dedicated *Create, *Response, *Update schemas for each payload type.

# old ------------------------------------------------
client.add_resource(ResourceCreateReqest(
  identifier=...,
  parent_identifier=...,
  license=...,
  payload=ImagingStudyPayload(
    type="MRI",
    ...
  )
))

# new ------------------------------------------------
from vxdata.schemas.create import ImagingStudy
client.imaging_studies.create(ImagingStudy(
    identifer=...,
    parent_identifier=...,
    license=...,
    type="MRI",
    ...
))

The create/update/response modules each export their schemas without the suffix for you to optionally use (be aware of potential name conflicts):

from vxdata.schemas.create import Volume      # Volume   is VolumeCreate
from vxdata.schemas.response import Volume    # Volume   is VolumeResponse
from vxdata.schemas.update import Volume      # Volume   is VolumeUpdate

client.volumes.update("volume/vol1", Volume(b_value=2000))

The suffixed names (VolumeCreate, VolumeUpdate, ...) remain available for explicit imports.

New: easy create schemas

Create a *Create schema directly from a *Response schema:

# useful when creating derivated resources
from vxdata.schemas.create import HistoScan
original = client.histo_scans.retrieve("histoscan/EE12345/rpe/001")
processed = HistoScan.from_response(original)
# now modify selectively, then upload
processed.identifier += "/processed"
...
client.histo_scans.create(processed)

New: benchmarking upload pattern

Benchmarking results are plain resources. There is no bespoke upload helper; stow any other_data DataFrames with storage.upload_dataframes, then create. timestamp defaults to now.

# old ------------------------------------------------
client.upload_benchmarking_result(
    clearml_id=task_id,
    eval_task="ve2e-default",
    benchmarked_on=patient_ids,
    summary_metrics={"auc": 0.8},
    other_data={"patient_level_results": df},
)

# new ------------------------------------------------
from vxdata.schemas.create import BenchmarkingResult
client.benchmarking_results.create(BenchmarkingResult(
    identifier=f"benchmarking/{task_id}",
    clearml_id=task_id,
    eval_task="ve2e-default",
    benchmarked_on=patient_ids,
    summary_metrics={"auc": 0.8},
    other_data=client.storage.upload_dataframes(
        {"patient_level_results": df}, group=f"benchmarking/{task_id}"
    ),
))

New: artifact upload pattern

Artefacts are created upstream (e.g. by the Argo run). Attach output DataFrames to an existing artefact and read them back; attach_files refuses to overwrite. artefact_id falls back to the ARTEFACT_RUN_ID env var.

# old ------------------------------------------------
client.upload_artefact(df)
df = client.get_artefacts(artefact_id=run_id)

# new ------------------------------------------------
client.artefacts.attach_files(df, artefact_id=run_id)
df = client.artefacts.download(run_id)

New: no more S3 credentials needed

The client no longer holds S3 credentials. All blob transfer goes through vxd.storage, which uses presigned URLs minted by the API. Whole directories work too (uploaded under one shared prefix; downloaded by listing the prefix server-side).

# old ------------------------------------------------
url = client.add_files(path, group="volumes")          # needed S3_ACCESS_KEY/S3_SECRET_KEY
local = client.download_files(url, dest_dir)
df = client.materialize_dataframe(df, dest_dir)

# new ------------------------------------------------
url = client.storage.upload(path, group="volumes")     # no credentials client-side
local = client.storage.download(url, dest_dir)
df = client.storage.materialize(df, dest_dir)

Deprecated: virdx-config file

The .virdx.config file (and the vxdata config --url=... CLI that wrote it) are gone. Configuration now resolves explicit arg > environment variable > default only. Set API_URL via the environment (a project .env is still loaded automatically).

# old: persisted to ~/.virdx.config via the CLI
#   vxdata config --url=http://...
# new ------------------------------------------------
# export API_URL=http://...   (env or .env)
client = Client()                 # picks up API_URL from the environment
client = Client(base_url="http://...")  # or pass explicitly

Almost unchanged: Querying​

New: better reading and writing patterns​

New: flat payload structures​

New: easy create schemas​

New: benchmarking upload pattern​

New: artifact upload pattern​

New: no more S3 credentials needed​

Deprecated: virdx-config file​