Querying Resources
This guide explains how to query data from the VXPlatform using the Python client.
Overview
There are two main ways to get data from the platform:
- Query specific resource types - Fetch only the payload types you need (recommended for efficiency)
- Get all data at once - Fetch all payload types in one call (convenient for exploration)
Both methods return Polars DataFrames, allowing you to use powerful Polars operations for filtering, joining, and transforming your data.
Query Specific Resource Types
Use query_resources() to fetch only the data you need. This is more efficient as it reduces the amount of data transferred and memory used.
Basic Queries
from vxp_client import PlatformClient
client = PlatformClient(url="http://localhost:18000")
# Query only volumes
volumes_df = client.query_resources("Volume", as_json=False)
# Query only patients
patients_df = client.query_resources("Patient", as_json=False)
# Query imaging studies
imaging_studies_df = client.query_resources("ImagingStudy", as_json=False)
Server-Side Filtering
You can use server-side filters to reduce the amount of data transferred. The platform supports various filter operators:
from vxp_client import PlatformClient
from vxp_client.filters import Equals, GreaterThan, LessThan, Contains, Between, After, Before
client = PlatformClient(url="http://localhost:18000")
# Filter volumes by type
t2_volumes = client.query_resources(
"Volume",
filters=[Equals("volume_type", "T2")],
as_json=False
)
# Combine multiple filters
filtered_volumes = client.query_resources(
"Volume",
filters=[
Equals("volume_type", "T2"),
Equals("image_plane", "transversal"),
],
as_json=False
)
# Filter patients by birth date
recent_patients = client.query_resources(
"Patient",
filters=[After("birth_date", "1980-01-01")],
as_json=False
)
# Use range filters
age_range_patients = client.query_resources(
"Patient",
filters=[Between("birth_date", "1970-01-01", "1990-12-31")],
as_json=False
)
# String matching
volumes_with_keyword = client.query_resources(
"Volume",
filters=[Contains("series_description", "T2")],
as_json=False
)
Available Filter Functions
Import from vxp_client.filters:
Equals(field, value)- Exact matchNotEquals(field, value)- Not equal to valueGreaterThan(field, value)- Greater than comparisonLessThan(field, value)- Less than comparisonBetween(field, lower, upper)- Value in range (inclusive)Contains(field, value)- String contains substringStartsWith(field, value)- String starts withEndsWith(field, value)- String ends withBefore(field, value)- Date/time beforeAfter(field, value)- Date/time after
Query Parameters
The query_resources() method supports several parameters:
# Query with parent filter
child_volumes = client.query_resources(
"Volume",
parent="imaging-study-123", # Only volumes under this imaging study
as_json=False
)
# Limit results
top_100 = client.query_resources(
"Volume",
limit=100,
as_json=False
)
# Query as of a specific timestamp
historical_data = client.query_resources(
"Patient",
as_of="2025-01-01T00:00:00Z",
as_json=False
)
# Exclude non-commercial licenses
commercial_only = client.query_resources(
"Volume",
exclude_non_commercial_licenses=True,
as_json=False
)
Available Resource Types
Common resource types include:
Patient- Patient demographic and clinical dataTrial- Clinical trial informationImagingStudy- Imaging study metadataVolume- Individual imaging volumes (sequences)PathologyAssessment- Pathology findings and assessmentsPIRADSAssessment- PI-RADS scoring for prostate imagingGleasonGradeGroup- Gleason grading information
See the payload schemas documentation for the complete list of available types and their fields.
Working with Query Results
Query results are Polars DataFrames, so you can use all Polars operations for further client-side filtering and transformation:
import polars as pl
# Server-side filter, then client-side operations
volumes = client.query_resources(
"Volume",
filters=[Equals("volume_type", "T2")],
as_json=False
)
# Further refine with Polars
transversal_t2 = (
volumes
.filter(pl.col("image_plane") == "transversal")
.sort("_created_at", descending=True)
)
print(f"Found T2 transversal volumes")
Best Practice: Use server-side filters when possible to reduce data transfer, then use Polars for complex transformations.
Available Resource Types
Use get_all_dataframes() to fetch all payload types in a single call. This returns a dictionary where keys are resource type names and values are Polars DataFrames.
from vxp_client import PlatformClient
client = PlatformClient(url="http://localhost:18000")
# Get all data
dfs = client.get_all_dataframes()
# Access individual DataFrames
volumes = dfs["Volume"]
patients = dfs["Patient"]
imaging_studies = dfs["ImagingStudy"]
pathology = dfs["PathologyAssessment"]
print(f"Loaded payload types")
for payload_type, df in dfs.items():
print(f" : records")
When to Use get_all_dataframes()
This method is useful when:
- You're exploring the data and don't know exactly what you need yet
- You need multiple payload types for complex queries with joins
- You're building a cohort that spans multiple resource types
- The dataset is small enough to fit in memory comfortably
Client-Side Processing with Polars
Once you have the data, you can use Polars operations for filtering, joining, sorting, and transforming. All query methods return Polars DataFrames.
import polars as pl
dfs = client.get_all_dataframes()
# Use standard Polars operations
filtered = dfs["Volume"].filter(pl.col("volume_type") == "T2")
sorted_data = filtered.sort("_created_at", descending=True)
print(f"Found T2 volumes")
Refer to the Polars documentation for details on available operations.
Performance Considerations
Use Server-Side Filters First
Server-side filters reduce the amount of data transferred from the platform:
# ❌ Inefficient: Fetches all volumes, then filters client-side
volumes = client.query_resources("Volume", as_json=False)
t2_volumes = volumes.filter(pl.col("volume_type") == "T2")
# ✅ Efficient: Filters on the server, transfers less data
t2_volumes = client.query_resources(
"Volume",
filters=[Equals("volume_type", "T2")],
as_json=False
)
Use query_resources() Over get_all_dataframes()
When you only need specific resource types:
# ❌ Inefficient: Fetches all data when you only need volumes
dfs = client.get_all_dataframes()
volumes = dfs["Volume"]
# ✅ Efficient: Fetches only volumes
volumes = client.query_resources("Volume", as_json=False)
Next Steps
- Learn about creating cohorts to organize your data into training/validation/test splits
- Explore the client API reference for more querying options
- Review payload schemas to understand available fields