Skip to main content

How does versioning work?

The data-platform is centered around strict versioning of stored data to ensure proper reproducibility for experiments. We accomplish this through a SCD-2 style approach to versioning.

The resources table holds an index over all logical resources that exist. The resource_versions table holds all versions of all resources.

Say we store a resource psa/pat001/2025-10-10 that holds information on a patient's PSA value. Then resources features a single row with the identifier psa/pat001/2025-10-10. For each version of this PSA measurement that we have, we have a row in resource_versions that holds detailed information. If the PSA value recorded needs to be corrected, we add a new row. Importantly, each row in resource_versions has a valid_from and valid_to field. Say we push a new resource at timestamp T: we set the previous' rows valid_to=T and create the new row with valid_from=T and valid_to=null.

How does this solve our versioning needs?

  • We want to retrieve all patients that were part of the Bamberg trial at timepoint T. We can query:
    SELECT * FROM resource_versions
    WHERE
    parent_identifier = 'trial/bamberg'
    AND payload_table = 'resource_patients'
    AND valid_from <= T
    AND (valid_to IS NULL OR valid_to > T);
  • Any older version of a resource can be obtained:
    SELECT m.psa_value
    FROM resource_versions rv
    JOIN payload_measurements m ON rv.payload_uid = m.uid
    WHERE
    rv.resource_identifier = 'psa/pat001/2025-10-10'
    AND rv.valid_from <= T
    AND (rv.valid_to IS NULL OR rv.valid_to > T);

Deleting Resources

We can soft-delete a resource by setting its valid_to field, and not creating a new, open-ended row. This will effectively exclude the resource from any future queries, while preserving historical correctness.

Hard-deletion is not yet implemented. We can either remove all relevant rows and make sure that any eventual child resources get assigned new parent resources. Alternatively, we could rename identifiers and wipe any payload data.