Skip to main content

Accessing Data

How to query resources and download files from the data platform.

Querying Resources

from vxp_client import PlatformClient, F

client = PlatformClient()

# Or query a specific resource type with filters
volumes = (
client.query("Volume")
.filter(F.volume_type == "T2", F.image_plane == "transversal")
.limit(100)
.collect()
)

# For convenience, you can pull all tables in full at once
dfs = client.get_all_dataframes()
patients = dfs["Patient"]
volumes = dfs["Volume"]

Explore available resource types and their fields in the frontend query tab or by looking at the payload schema docs.

For the full querying API (lineage filters, subqueries, ANY/ALL, joins), see the querying tutorial.

Downloading Files

Table data is returned directly as Polars DataFrames. Blob data (MRIs, DICOMs, PDFs) is stored on S3.

Tables may contain URLs of blob files that are stored on S3 - for example, the Volume table holds a column path_nii that points to the location of the corresponding nifti file. We can a) download files individually by specifying their paths or b) pull all data contained in a single polars dataframe at once.

Download individual S3 URLs

# Single file
local_path = client.download_files("s3://bucket/path/to/file.nii.gz", dest_dir=Path("./downloads"))

# Multiple files
local_paths = client.download_files(
["s3://bucket/file1.nii.gz", "s3://bucket/file2.nii.gz"],
dest_dir=Path("./downloads")
)

Download all S3 URLs in a DataFrame

from pathlib import Path

volumes = client.query("Volumes").filter(...).collect() # build your dataframe of interest here

# Automatically detects S3 URL columns, downloads in parallel to disk, and replaces paths in dataframe with local paths
volumes_local = client.materialize_dataframe(volumes, dest_dir=Path("./downloads"))