Working with Artifacts
Artifacts are a way to represent larger amounts of data that may be impractical to store individually in vxData. The primary motivation behind artifacts is the need to store + represent experiment-specific compiled, processed sets of data.
An artifact is just a payload like any other, but it records fields relevant for experiments:
- Provenance: git identifiers for the artifact-producing code
- Files: Additional dataframes or other files on disk/S3.
- Hardware Resources used to compute the artifact as well as a free text description.
Storing Artifacts
Artifacts are usually created by triggering the artifact generation job from the VirDx Dashboard frontend. This operation will produce the initial artifact resource - your code then later only appends the generated data/files to the artifact.
# this initial artefact creation is usually done automatically for you
client.artefacts.create(
ArtefactCreate(
identifier=f"artefact/{artefact_id}",
github_repo="virdx/vxdata-workshop",
github_branch=git["branch"],
github_path="/",
github_commit=git["commit"],
created_by="vxdata-workshop",
description="PSA features + ISUP labels for ISUP classification",
)
)
# your code then only runs this:
client.artefacts.attach_files(result, artefact_id=artefact_id)