Skip to main content

Working with Artifacts

Artifacts are a way to represent larger amounts of data that may be impractical to store individually in vxData. The primary motivation behind artifacts is the need to store + represent experiment-specific compiled, processed sets of data.

An artifact is just a payload like any other, but it records fields relevant for experiments:

  • Provenance: git identifiers for the artifact-producing code
  • Files: Additional dataframes or other files on disk/S3.
  • Hardware Resources used to compute the artifact as well as a free text description.

Storing Artifacts

Artifacts are usually created by triggering the artifact generation job from the VirDx Dashboard frontend. This operation will produce the initial artifact resource - your code then later only appends the generated data/files to the artifact.

# this initial artefact creation is usually done automatically for you
client.artefacts.create(
ArtefactCreate(
identifier=f"artefact/{artefact_id}",
github_repo="virdx/vxdata-workshop",
github_branch=git["branch"],
github_path="/",
github_commit=git["commit"],
created_by="vxdata-workshop",
description="PSA features + ISUP labels for ISUP classification",
)
)

# your code then only runs this:
client.artefacts.attach_files(result, artefact_id=artefact_id)