Skip to main content

Querying Resources

This guide explains how to query data from the VXPlatform using the Python client.

Overview

There are two main ways to get data from the platform:

  1. Query specific resource types - Fetch only the payload types you need (recommended for efficiency)
  2. Get all data at once - Fetch all payload types in one call (convenient for exploration)

Both methods return Polars DataFrames, allowing you to use powerful Polars operations for filtering, joining, and transforming your data.

Query Specific Resource Types

Use query_resources() to fetch only the data you need. This is more efficient as it reduces the amount of data transferred and memory used.

Basic Queries

from vxp_client import PlatformClient

client = PlatformClient(url="http://localhost:18000")

# Query only volumes
volumes_df = client.query_resources("Volume", as_json=False)

# Query only patients
patients_df = client.query_resources("Patient", as_json=False)

# Query imaging studies
imaging_studies_df = client.query_resources("ImagingStudy", as_json=False)

Server-Side Filtering

You can use server-side filters to reduce the amount of data transferred. The platform supports various filter operators:

from vxp_client import PlatformClient
from vxp_client.filters import Equals, GreaterThan, LessThan, Contains, Between, After, Before

client = PlatformClient(url="http://localhost:18000")

# Filter volumes by type
t2_volumes = client.query_resources(
"Volume",
filters=[Equals("volume_type", "T2")],
as_json=False
)

# Combine multiple filters
filtered_volumes = client.query_resources(
"Volume",
filters=[
Equals("volume_type", "T2"),
Equals("image_plane", "transversal"),
],
as_json=False
)

# Filter patients by birth date
recent_patients = client.query_resources(
"Patient",
filters=[After("birth_date", "1980-01-01")],
as_json=False
)

# Use range filters
age_range_patients = client.query_resources(
"Patient",
filters=[Between("birth_date", "1970-01-01", "1990-12-31")],
as_json=False
)

# String matching
volumes_with_keyword = client.query_resources(
"Volume",
filters=[Contains("series_description", "T2")],
as_json=False
)

Available Filter Functions

Import from vxp_client.filters:

  • Equals(field, value) - Exact match
  • NotEquals(field, value) - Not equal to value
  • GreaterThan(field, value) - Greater than comparison
  • LessThan(field, value) - Less than comparison
  • Between(field, lower, upper) - Value in range (inclusive)
  • Contains(field, value) - String contains substring
  • StartsWith(field, value) - String starts with
  • EndsWith(field, value) - String ends with
  • Before(field, value) - Date/time before
  • After(field, value) - Date/time after

Query Parameters

The query_resources() method supports several parameters:

# Query with parent filter
child_volumes = client.query_resources(
"Volume",
parent="imaging-study-123", # Only volumes under this imaging study
as_json=False
)

# Limit results
top_100 = client.query_resources(
"Volume",
limit=100,
as_json=False
)

# Query as of a specific timestamp
historical_data = client.query_resources(
"Patient",
as_of="2025-01-01T00:00:00Z",
as_json=False
)

# Exclude non-commercial licenses
commercial_only = client.query_resources(
"Volume",
exclude_non_commercial_licenses=True,
as_json=False
)

Available Resource Types

Common resource types include:

  • Patient - Patient demographic and clinical data
  • Trial - Clinical trial information
  • ImagingStudy - Imaging study metadata
  • Volume - Individual imaging volumes (sequences)
  • PathologyAssessment - Pathology findings and assessments
  • PIRADSAssessment - PI-RADS scoring for prostate imaging
  • GleasonGradeGroup - Gleason grading information

See the payload schemas documentation for the complete list of available types and their fields.

Working with Query Results

Query results are Polars DataFrames, so you can use all Polars operations for further client-side filtering and transformation:

import polars as pl

# Server-side filter, then client-side operations
volumes = client.query_resources(
"Volume",
filters=[Equals("volume_type", "T2")],
as_json=False
)

# Further refine with Polars
transversal_t2 = (
volumes
.filter(pl.col("image_plane") == "transversal")
.sort("_created_at", descending=True)
)

print(f"Found T2 transversal volumes")

Best Practice: Use server-side filters when possible to reduce data transfer, then use Polars for complex transformations.

Available Resource Types

Use get_all_dataframes() to fetch all payload types in a single call. This returns a dictionary where keys are resource type names and values are Polars DataFrames.

from vxp_client import PlatformClient

client = PlatformClient(url="http://localhost:18000")

# Get all data
dfs = client.get_all_dataframes()

# Access individual DataFrames
volumes = dfs["Volume"]
patients = dfs["Patient"]
imaging_studies = dfs["ImagingStudy"]
pathology = dfs["PathologyAssessment"]

print(f"Loaded payload types")
for payload_type, df in dfs.items():
print(f" : records")

When to Use get_all_dataframes()

This method is useful when:

  • You're exploring the data and don't know exactly what you need yet
  • You need multiple payload types for complex queries with joins
  • You're building a cohort that spans multiple resource types
  • The dataset is small enough to fit in memory comfortably

Client-Side Processing with Polars

Once you have the data, you can use Polars operations for filtering, joining, sorting, and transforming. All query methods return Polars DataFrames.

import polars as pl

dfs = client.get_all_dataframes()

# Use standard Polars operations
filtered = dfs["Volume"].filter(pl.col("volume_type") == "T2")
sorted_data = filtered.sort("_created_at", descending=True)

print(f"Found T2 volumes")

Refer to the Polars documentation for details on available operations.

Performance Considerations

Use Server-Side Filters First

Server-side filters reduce the amount of data transferred from the platform:

# ❌ Inefficient: Fetches all volumes, then filters client-side
volumes = client.query_resources("Volume", as_json=False)
t2_volumes = volumes.filter(pl.col("volume_type") == "T2")

# ✅ Efficient: Filters on the server, transfers less data
t2_volumes = client.query_resources(
"Volume",
filters=[Equals("volume_type", "T2")],
as_json=False
)

Use query_resources() Over get_all_dataframes()

When you only need specific resource types:

# ❌ Inefficient: Fetches all data when you only need volumes
dfs = client.get_all_dataframes()
volumes = dfs["Volume"]

# ✅ Efficient: Fetches only volumes
volumes = client.query_resources("Volume", as_json=False)

Next Steps