VxData Refactor June 2026 - Migration Guide
This document outlines the changes deployed as part of the June 2026 refactor of the vxData API & SDK. We guide through necessary steps to migrate existing code to remain compatible.
Almost unchanged: Querying
With the refactor, querying remains almost the same:
from vxdata.sdk import F
# instead of client.query("Volumes") we now write client.volumes.query() for better typing!
df_volumes = client.volumes.query().filter(F.b_value > 0).collect()
# to obtain a list of *Response models instead of a dataframe
volumes = client.volumes.query().filter(F.b_value > 0).collect_as_pydantic()
New: better reading and writing patterns
We have refactored CRUD operations and added better typing:
# client.<namespace>.retrieve will raise an error if any passed resource is not found
client.patients.retrieve("patient/BB12345") -> PatientResponse
client.patients.retrieve(["patient/BB12345", "patient/BB12346"]) -> list[PatientResponse]
# client.<namespace>.update is required to selectively update one or more fields
client.patients.update("patient/BB12345", PatientUpdate(year_of_birth=1940))
client.patients.update({"patient/BB12345": ..., ...})
client.volumes.create(VolumeCreateRequest(...)) # see below
...
New: flat payload structures
Previously, ResourceCreateRequests nested Payloads which was unnecessarily complex both mentally and typing-wise.
With vxData 1.0, we flatten schemas and provide dedicated *Create, *Response, *Update schemas for each payload type.
# old ------------------------------------------------
client.add_resource(ResourceCreateReqest(
identifier=...,
parent_identifier=...,
license=...,
payload=ImagingStudyPayload(
type="MRI",
...
)
))
# new ------------------------------------------------
from vxdata.schemas.create import ImagingStudy
client.imaging_studies.create(ImagingStudy(
identifer=...,
parent_identifier=...,
license=...,
type="MRI",
...
))
The create/update/response modules each export their schemas without the
suffix for you to optionally use (be aware of potential name conflicts):
from vxdata.schemas.create import Volume # Volume is VolumeCreate
from vxdata.schemas.response import Volume # Volume is VolumeResponse
from vxdata.schemas.update import Volume # Volume is VolumeUpdate
client.volumes.update("volume/vol1", Volume(b_value=2000))
The suffixed names (VolumeCreate, VolumeUpdate, ...) remain available for explicit
imports.
New: easy create schemas
Create a *Create schema directly from a *Response schema:
# useful when creating derivated resources
from vxdata.schemas.create import HistoScan
original = client.histo_scans.retrieve("histoscan/EE12345/rpe/001")
processed = HistoScan.from_response(original)
# now modify selectively, then upload
processed.identifier += "/processed"
...
client.histo_scans.create(processed)
New: benchmarking upload pattern
Benchmarking results are plain resources. There is no bespoke upload helper; stow any
other_data DataFrames with storage.upload_dataframes, then create. timestamp
defaults to now.
# old ------------------------------------------------
client.upload_benchmarking_result(
clearml_id=task_id,
eval_task="ve2e-default",
benchmarked_on=patient_ids,
summary_metrics={"auc": 0.8},
other_data={"patient_level_results": df},
)
# new ------------------------------------------------
from vxdata.schemas.create import BenchmarkingResult
client.benchmarking_results.create(BenchmarkingResult(
identifier=f"benchmarking/{task_id}",
clearml_id=task_id,
eval_task="ve2e-default",
benchmarked_on=patient_ids,
summary_metrics={"auc": 0.8},
other_data=client.storage.upload_dataframes(
{"patient_level_results": df}, group=f"benchmarking/{task_id}"
),
))
New: artifact upload pattern
Artefacts are created upstream (e.g. by the Argo run). Attach output DataFrames to an
existing artefact and read them back; attach_files refuses to overwrite. artefact_id
falls back to the ARTEFACT_RUN_ID env var.
# old ------------------------------------------------
client.upload_artefact(df)
df = client.get_artefacts(artefact_id=run_id)
# new ------------------------------------------------
client.artefacts.attach_files(df, artefact_id=run_id)
df = client.artefacts.download(run_id)
New: no more S3 credentials needed
The client no longer holds S3 credentials. All blob transfer goes through vxd.storage,
which uses presigned URLs minted by the API. Whole directories work too (uploaded under
one shared prefix; downloaded by listing the prefix server-side).
# old ------------------------------------------------
url = client.add_files(path, group="volumes") # needed S3_ACCESS_KEY/S3_SECRET_KEY
local = client.download_files(url, dest_dir)
df = client.materialize_dataframe(df, dest_dir)
# new ------------------------------------------------
url = client.storage.upload(path, group="volumes") # no credentials client-side
local = client.storage.download(url, dest_dir)
df = client.storage.materialize(df, dest_dir)
Deprecated: virdx-config file
The .virdx.config file (and the vxdata config --url=... CLI that wrote it) are gone.
Configuration now resolves explicit arg > environment variable > default only. Set
API_URL via the environment (a project .env is still loaded automatically).
# old: persisted to ~/.virdx.config via the CLI
# vxdata config --url=http://...
# new ------------------------------------------------
# export API_URL=http://... (env or .env)
client = Client() # picks up API_URL from the environment
client = Client(base_url="http://...") # or pass explicitly