Querying in Detail
A detailed walkthrough of the data platform's query capabilities.
Available Resources
The platform contains various resource types such as patients, imaging studies, volumes, and clinical assessments. To see what resources are available and explore their data, visit the Query tab in the frontend.
Python Client
The Python client is the recommended way to query data programmatically. Install it and create a client instance:
from vxp_client import PlatformClient, F
client = PlatformClient()
## Querying Data
The client uses a chainable query builder pattern. You construct a query by chaining methods like `.filter()`, `.select()`, and `.limit()`. The query is only executed when you call `.collect()`, which returns a Polars DataFrame with the results.
```python
# Get all volumes from the platform
volumes = client.query("Volume").collect()
# Chainable filtering
volumes = (
client.query("Volume")
.filter(F.volume_type == "T2",
F.image_plane == "transversal")
.limit(100)
.collect()
)
Field Accessor (F)
The F accessor provides operator overloading for filter syntax:
from vxp_client import F
# Comparison operators
F.volume_type == "T2" # equals
F.volume_type != "T1" # not equals
F.slice_thickness > 2.5 # greater than
F.slice_thickness >= 2.5 # greater or equal
F.slice_thickness < 5.0 # less than
F.slice_thickness <= 5.0 # less or equal
# Methods
F.series_description.contains("T2")
F.patient_id.startswith("DA")
F.patient_id.is_in(["DA001", "DA002", "DA003"])
F.slice_thickness.between(2.0, 4.0)
F.acquisition_datetime.after("2024-01-01")
F.acquisition_datetime.before("2025-01-01")
Metadata Fields
These metadata fields exist for every resource type:
| Field | Description |
|---|---|
identifier | Resource identifier (e.g., volume/DA00001/...) |
parent_identifier | Parent resource identifier |
valid_from | Version valid from timestamp |
valid_to | Version valid to timestamp |
created_at | Creation timestamp |
license | License string |
Additional fields vary by resource type and can be seen in the frontend on /query.
# Get specific volumes by identifier
volumes = (
client.query("Volume")
.filter(F.identifier.is_in([
"volume/DA00001/study-1/series-1",
"volume/DA00002/study-1/series-1",
]))
.collect()
)
Note: Don't use
F.parent_identifierto filter by datasource or patient as it only matches direct parents. Use Lineage Filters instead to query all descendants.
Lineage Filters
Resources form a hierarchy: Datasource → Patient → ImagingStudy → Volume. Each resource automatically has lineage fields (datasource_id, patient_id, study_id) derived from its ancestry. Use these to filter resources at any level:
from vxp_client import PlatformClient, F, has_ancestor
client = PlatformClient()
# All volumes from a datasource
volumes = client.query("Volume").filter(F.datasource_id == "datasource/dasa").collect()
# Multiple datasources
volumes = client.query("Volume").filter(F.datasource_id.is_in(["datasource/dasa", "datasource/basel"])).collect()
# All resources under a patient
volumes = client.query("Volume").filter(F.patient_id == "patient/DA00001").collect()
# All resources under a study
volumes = client.query("Volume").filter(F.study_id == "imagingstudy/DA00001/study-1").collect()
# Generic ancestor filter (for other ancestor types)
volumes = client.query("Volume").filter(has_ancestor("datasource/dasa")).collect()
Combine lineage filters with other filters:
# Axial T2 volumes from DASA
volumes = (
client.query("Volume")
.filter(
F.datasource_id == "datasource/dasa",
F.volume_type == "T2",
F.image_plane == "transversal"
)
.collect()
)
| Filter | Description | Example |
|---|---|---|
F.datasource_id | Resources under datasource | F.datasource_id == "datasource/dasa" |
F.patient_id | Resources under patient | F.patient_id == "patient/DA001" |
F.study_id | Resources under study | F.study_id == "imagingstudy/DA001/123" |
has_ancestor(value) | Generic ancestor filter | has_ancestor("datasource/dasa") |
Subqueries
Embed one query inside another using .select() and .is_in():
# Get volumes linked to submitted annotation sessions
# Note: derived_from is an array, so we check if any element matches
session_ids = (
client.query("AnnotationSession")
.filter(F.source == "manual-masks", F.status == "submitted")
.select("identifier")
.collect()["identifier"]
.to_list()
)
masks = (
client.query("Volume")
.filter(F.volume_type == "SEG")
.collect()
.filter(
pl.col("derived_from").list.eval(pl.element().is_in(session_ids)).list.any()
)
)
This first fetches the session IDs, then filters volumes where any derived_from element matches.
ANY/ALL Comparisons
For numeric comparisons against subquery results, use ANY() and ALL():
from vxp_client import F, ANY, ALL
# Volumes thicker than any T1 volume
thick_volumes = (
client.query("Volume")
.filter(
F.slice_thickness > ANY(
client.query("Volume")
.filter(F.volume_type == "T1")
.select("slice_thickness")
)
)
.collect()
)
# Volumes thinner than all T2 volumes
thin_volumes = (
client.query("Volume")
.filter(
F.slice_thickness < ALL(
client.query("Volume")
.filter(F.volume_type == "T2")
.select("slice_thickness")
)
)
.collect()
)
Supported operators: >, >=, <, <=
Column Selection
Use .select() to fetch only specific columns:
# Only fetch the columns you need
masks = (
client.query("Volume")
.filter(F.volume_type == "SEG")
.select("path_nii", "segmentation_reference_uid")
.collect()
)
Note: When .select() is used inside .is_in(), it creates a subquery. When used before .collect(), it limits the returned columns.
Combining with Polars
Since .collect() returns a Polars DataFrame, you can use Polars operations to join and aggregate data from multiple queries:
import polars as pl
from vxp_client import PlatformClient, F
client = PlatformClient()
# Get axial T2 volumes for patients from bamberg
volumes = (
client.query("Volume")
.filter(F.volume_type == "T2",
F.image_plane == "transversal",
F.datasource_id == "datasource/bamberg")
.select("identifier", "patient_id")
.collect()
)
# Get highest ISUP per patient
pathology = (
client.query("PathologyAssessment")
.filter(F.datasource_id == "datasource/bamberg")
.select("patient_id", "isup")
.collect()
.group_by("patient_id")
.agg(pl.col("isup").max().alias("max_isup"))
)
# Get highest PIRADS per patient
pirads = (
client.query("PIRADSAssessment")
.filter(F.datasource_id == "datasource/bamberg")
.select("patient_id", "score")
.collect()
.group_by("patient_id")
.agg(pl.col("score").max().alias("max_pirads"))
)
# Join everything together
result = (
volumes
.join(pathology, on="patient_id", how="left")
.join(pirads, on="patient_id", how="left")
)
Next Steps
- Client API reference
- Payload schemas for available fields per resource type