Using the vxp_client package
The data platform can be best accessed programmatically using the vxp_client Python package.
It makes use of vxp_schemas for data validation.
Any schema defined in vxp_schema can be retrieved from or pushed to the data platform.
Installation
Assuming you are authenticated with out Artifactory:
pixi global install vxp_client
Usage
import vxp_schemas as schemas
from vxp_client import PlatformClient, handlers
# configure your connection accordingly
client = PlatformClient(base_url="http://192.168.10.101:2701/")
# pull a dataset
bb_train = client.get_dataset("bamberg-train", version="latest")
print(bb_train.resources)
> [ study/bb01, series/bb01/t2, clinical/bb01/psa01, ... ]
# filter for resources of schema DICOMStudy
study_resources: list[schemas.DICOMStudy] = bb_train.filter_for_schema(schemas.DICOMStudy)
# and pass them into the relevant handler method, which will download and convert them
volumes = handlers.load_dicom_studies(study_resources)
handlers.store_dicom_studies(study_resources, path="...")
clinicals = bb_train.filter_for_schema([
schemas.PIRADSScoring, schemas.PSAMeasurement, schemas.BiopsyResult
])
resources = client.query(
schema=schemas.PIRADSScoring,
parent="patient/bb01",
only_immediate_parent=False,
filter=lambda x: x.pirads_score >= 4, # these are evaluated in-memory after filtering for parent and schema (yes, suboptimal, but easiest for now)
)
# the resource class has helpful methods to navigate the resource tree
r = resources[0]
r.children
r.parent
r.parents
r.siblings
class Resource:
@computed
def children(self) -> list["Resource"]:
"""List of all immediate children of this resource."""
return self._client.list_resources(parent=self.identifier)
@computed
def parent(self) -> "Resource | None":
"""The immediate parent of this resource, or None if this is a root resource."""
return self._client.get_resource(self._parent)
@computed
def parents(self) -> list["Resource"]:
"""List of all parents of this resource, starting from the immediate parent up to the root."""
return ...
@computed
def siblings(self) -> list["Resource"]:
"""List of all resources that share the same immediate parent as this resource, excluding this resource itself."""
return ...
Requirements list
Dataset-related features:
- User can list all available datasets
- User can list all available lockfiles for a given dataset
- User can load a dataset's resources a) in-memory and b) to a single JSON file
Resource-CRUD-related features:
- A user can add new resources to the platform
- A user can update an existing resource's payload
- A user can delete a resource
- A user can change a resource's parent
- A user can retrieve resource details by the resource identifier, optionally with children resources
Querying features:
- A user can query resources of a specified schema by:
- defining the immediate parent
- (defining any parent)
- constraining by a fields value (comparison operators)
Interface specification
client
def load_resource(identifier: str, version: str = "latest") -> Resource:
...
def load_resources(identifiers: list[str | tuple[str,str]]) -> list[Resource]:
"""Runs the POST request for retrieving multiple resources at once."""
def load_dataset(dataset: str, lockfile: str = "latest") -> list[Resource] | Dataset:
...
def query(
schema: type,
parent: str | None = None,
) -> list[str]:
"""Performs"""
handlers
def load_dicom_studies(studies: list[schemas.DICOMStudy]) -> list[Volume]:
...
def store_dicom_studies(studies: list[schemas.DICOMStudy], path: str) -> None:
...
def load_dicom_series(series: list[schemas.DICOMSeries]) -> list[Volume]:
...
def store_dicom_series(series: list[schemas.DICOMSeries], path: str) -> None:
...
def convert_to_dataframe(resource: list[schemas.BaseModel]) -> pl.DataFrame:
...