Skip to main content

Querying in Detail

A detailed walkthrough of the data platform's query capabilities.

Available Resources

The platform contains various resource types such as patients, imaging studies, volumes, and clinical assessments. To see what resources are available and explore their data, visit the Query tab in the frontend.

Python Client

The Python client is the recommended way to query data programmatically. Install it and create a client instance:

from vxp_client import PlatformClient, F

client = PlatformClient()


## Querying Data

The client uses a chainable query builder pattern. You construct a query by chaining methods like `.filter()`, `.select()`, and `.limit()`. The query is only executed when you call `.collect()`, which returns a Polars DataFrame with the results.

```python
# Get all volumes from the platform
volumes = client.query("Volume").collect()



# Chainable filtering
volumes = (
client.query("Volume")
.filter(F.volume_type == "T2",
F.image_plane == "transversal")
.limit(100)
.collect()
)

Field Accessor (F)

The F accessor provides operator overloading for filter syntax:

from vxp_client import F

# Comparison operators
F.volume_type == "T2" # equals
F.volume_type != "T1" # not equals
F.slice_thickness > 2.5 # greater than
F.slice_thickness >= 2.5 # greater or equal
F.slice_thickness < 5.0 # less than
F.slice_thickness <= 5.0 # less or equal

# Methods
F.series_description.contains("T2")
F.patient_id.startswith("DA")
F.patient_id.is_in(["DA001", "DA002", "DA003"])
F.slice_thickness.between(2.0, 4.0)
F.acquisition_datetime.after("2024-01-01")
F.acquisition_datetime.before("2025-01-01")

Metadata Fields

These metadata fields exist for every resource type:

FieldDescription
identifierResource identifier (e.g., volume/DA00001/...)
parent_identifierParent resource identifier
valid_fromVersion valid from timestamp
valid_toVersion valid to timestamp
created_atCreation timestamp
licenseLicense string

Additional fields vary by resource type and can be seen in the frontend on /query.

# Get specific volumes by identifier
volumes = (
client.query("Volume")
.filter(F.identifier.is_in([
"volume/DA00001/study-1/series-1",
"volume/DA00002/study-1/series-1",
]))
.collect()
)

Note: Don't use F.parent_identifier to filter by datasource or patient as it only matches direct parents. Use Lineage Filters instead to query all descendants.

Lineage Filters

Resources form a hierarchy: Datasource → Patient → ImagingStudy → Volume. Each resource automatically has lineage fields (datasource_id, patient_id, study_id) derived from its ancestry. Use these to filter resources at any level:

from vxp_client import PlatformClient, F, has_ancestor

client = PlatformClient()

# All volumes from a datasource
volumes = client.query("Volume").filter(F.datasource_id == "datasource/dasa").collect()

# Multiple datasources
volumes = client.query("Volume").filter(F.datasource_id.is_in(["datasource/dasa", "datasource/basel"])).collect()

# All resources under a patient
volumes = client.query("Volume").filter(F.patient_id == "patient/DA00001").collect()

# All resources under a study
volumes = client.query("Volume").filter(F.study_id == "imagingstudy/DA00001/study-1").collect()

# Generic ancestor filter (for other ancestor types)
volumes = client.query("Volume").filter(has_ancestor("datasource/dasa")).collect()

Combine lineage filters with other filters:

# Axial T2 volumes from DASA
volumes = (
client.query("Volume")
.filter(
F.datasource_id == "datasource/dasa",
F.volume_type == "T2",
F.image_plane == "transversal"
)
.collect()
)
FilterDescriptionExample
F.datasource_idResources under datasourceF.datasource_id == "datasource/dasa"
F.patient_idResources under patientF.patient_id == "patient/DA001"
F.study_idResources under studyF.study_id == "imagingstudy/DA001/123"
has_ancestor(value)Generic ancestor filterhas_ancestor("datasource/dasa")

Subqueries

Embed one query inside another using .select() and .is_in():

# Get volumes linked to submitted annotation sessions
# Note: derived_from is an array, so we check if any element matches
session_ids = (
client.query("AnnotationSession")
.filter(F.source == "manual-masks", F.status == "submitted")
.select("identifier")
.collect()["identifier"]
.to_list()
)

masks = (
client.query("Volume")
.filter(F.volume_type == "SEG")
.collect()
.filter(
pl.col("derived_from").list.eval(pl.element().is_in(session_ids)).list.any()
)
)

This first fetches the session IDs, then filters volumes where any derived_from element matches.

ANY/ALL Comparisons

For numeric comparisons against subquery results, use ANY() and ALL():

from vxp_client import F, ANY, ALL

# Volumes thicker than any T1 volume
thick_volumes = (
client.query("Volume")
.filter(
F.slice_thickness > ANY(
client.query("Volume")
.filter(F.volume_type == "T1")
.select("slice_thickness")
)
)
.collect()
)

# Volumes thinner than all T2 volumes
thin_volumes = (
client.query("Volume")
.filter(
F.slice_thickness < ALL(
client.query("Volume")
.filter(F.volume_type == "T2")
.select("slice_thickness")
)
)
.collect()
)

Supported operators: >, >=, <, <=

Column Selection

Use .select() to fetch only specific columns:

# Only fetch the columns you need
masks = (
client.query("Volume")
.filter(F.volume_type == "SEG")
.select("path_nii", "segmentation_reference_uid")
.collect()
)

Note: When .select() is used inside .is_in(), it creates a subquery. When used before .collect(), it limits the returned columns.

Combining with Polars

Since .collect() returns a Polars DataFrame, you can use Polars operations to join and aggregate data from multiple queries:

import polars as pl
from vxp_client import PlatformClient, F

client = PlatformClient()

# Get axial T2 volumes for patients from bamberg
volumes = (
client.query("Volume")
.filter(F.volume_type == "T2",
F.image_plane == "transversal",
F.datasource_id == "datasource/bamberg")
.select("identifier", "patient_id")
.collect()
)

# Get highest ISUP per patient
pathology = (
client.query("PathologyAssessment")
.filter(F.datasource_id == "datasource/bamberg")
.select("patient_id", "isup")
.collect()
.group_by("patient_id")
.agg(pl.col("isup").max().alias("max_isup"))
)

# Get highest PIRADS per patient
pirads = (
client.query("PIRADSAssessment")
.filter(F.datasource_id == "datasource/bamberg")
.select("patient_id", "score")
.collect()
.group_by("patient_id")
.agg(pl.col("score").max().alias("max_pirads"))
)

# Join everything together
result = (
volumes
.join(pathology, on="patient_id", how="left")
.join(pirads, on="patient_id", how="left")
)

Next Steps