Provenance
Core Principles
Version-only provenance - Provenance is recorded only for immutable version snapshots (like _v47
), not for moving targets like _current
or _next
.
Meta-flow storage - Semantic Flow-specific provenance lives in meta-flows, referencing version snapshots in other flows. Domain-specific provenance can live in datasets themselves.
Current snapshot duplication - _current
meta snapshots contain identical copies of the latest version's provenance with base URI pointing to the version snapshot for stable fragment resolution.
Architecture
Version Snapshot Provenance
# In my-dataset/_meta-flow/_v47/my-dataset_meta.trig
@base <../_v47/> .
# Weave activity with PROV standard properties
:configUpdateActivity a meta:ConfigWeave ;
prov:startedAtTime "2025-07-20T14:30:00Z" ;
prov:endedAtTime "2025-07-20T14:30:15Z" ;
prov:used <../../_config-flow/_v46/config.jsonld> ;
prov:generated <../../_config-flow/_v47/config.jsonld> ;
prov:wasAssociatedWith <https://semantic-flow.org/agents/flow-service-bot> .
# Rights and licensing at snapshot level
<../../_config-flow/_v47> dcterms:rightsHolder <https://orcid.org/0000-0002-1825-0097> ;
dcterms:license <https://creativecommons.org/licenses/by-sa/4.0/> ;
prov:has_provenance :configProvenance .
# Delegation chain (step 1 = top authority, gets copyright by default)
:configProvenance a meta:ProvenanceContext ;
meta:forActivity :configUpdateActivity ;
meta:forSnapshot <../../_config-flow/_v47> ;
prov:wasAttributedTo <https://acme-corp.com/org> ; # Primary attribution
meta:delegationChain :delegationChain_001 .
:delegationChain_001 meta:hasStep :step1, :step2, :step3 .
:step1 a meta:DelegationStep ;
meta:stepOrder 1 ;
prov:agent <https://acme-corp.com/org> . # Prime mover, no actedOnBehalfOf
:step2 a meta:DelegationStep ;
meta:stepOrder 2 ;
prov:agent <https://orcid.org/0000-0002-1825-0097> ;
prov:actedOnBehalfOf <https://acme-corp.com/org> .
:step3 a meta:DelegationStep ;
meta:stepOrder 3 ;
prov:agent <https://semantic-flow.org/agents/flow-service-bot> ;
prov:actedOnBehalfOf <https://orcid.org/0000-0002-1825-0097> .
Current Snapshot Copy
# In my-dataset/_meta-flow/_current/my-dataset_meta.trig
@base <../_v47/> .
# Identical content to version snapshot - all URIs resolve to stable version
# (same provenance content as above)
Unversioned Flow Accumulation
For flows without versioning, activities accumulate in _next
with unique timestamps:
# In my-dataset/_meta-flow/_next/my-dataset_meta.trig
:dataActivity_2025-07-20_14-30 a meta:DataWeave ;
prov:startedAtTime "2025-07-20T14:30:00Z" ;
prov:generated <../../_data-flow/_current/data.trig> .
:dataActivity_2025-07-20_16-45 a meta:DataWeave ;
prov:startedAtTime "2025-07-20T16:45:00Z" ;
prov:used <../../_data-flow/_current/data.trig> ;
prov:generated <../../_data-flow/_current/data.trig> .
Key Components
Activity Types (subclass prov:Activity
)
meta:ConfigWeave
,meta:ReferenceWeave
,meta:DataWeave
,meta:MetaWeave
meta:NodeWeave
(entire node),meta:NodeTreeWeave
(recursive)
Provenance Entities (subclass meta:ProvenanceEntity
)
meta:ProvenanceContext
- Relator for complex authorship scenariosmeta:DelegationChain
/meta:DelegationStep
- Authorization chainsmeta:AgentRoleCollection
/meta:AgentRole
- Collaborative role assignments
Standard Properties Used
prov:agent
,prov:actedOnBehalfOf
,prov:wasAttributedTo
(instead of custom properties)dcterms:rightsHolder
,dcterms:license
(rights at snapshot level)prov:has_provenance
(link snapshots to provenance contexts)
Delegation Chain Pattern
Step ordering: Lower numbers = higher authority
- Step 1: Prime mover (organization) - gets copyright by default, no
prov:actedOnBehalfOf
- Step 2+: Each agent acts on behalf of the previous step's agent
- Tools/software agents typically at the end of the chain
Configuration
Copyright assignment: Configurable in node-config-defaults, defaults to first agent in delegation chain (step 1).
External vocabulary tracking: Use SHACL to declare recommended external properties like prov:wasInfluencedBy
, dcterms:license
.
Implementation Notes
- Fragment URIs: Use
<#step1>
etc. within version snapshots for stable addressability - Base URI: All snapshots use
@base <../_vN/>
pattern for consistent resolution - Rights inheritance: Capture previous version rights holders in provenance contexts when content is derived
- Static site friendly: Documentation approach for external references since no server-side redirects available
Fragment Identifier Naming Scheme
To ensure that every RDF node within a _meta
distribution has a unique and dereferenceable URI, the following naming scheme for fragment identifiers MUST be used. This allows the index.html
file for a given snapshot version to correctly link to all provenance entities.
The structure is as follows:
<{flow-slug}-{version}-{entity-type}[-{unique-part}]>
{flow-slug}
: The slug of the flow this provenance describes (e.g.,config-flow
,data-flow
). This provides the primary namespace for the identifier.{version}
: The version of the snapshot (e.g.,v47
). This scopes the provenance to a specific point in time.{entity-type}
: The type of the entity, using a consistent camelCase or kebab-case convention (e.g.,activity
,context
,delegationChain
,delegationStep
).{unique-part}
: (Optional) A unique suffix, such as a step number or a timestamp, used when multiple entities of the same type exist for the same flow and version.
Example
For a config-flow
at version v47
, the identifiers would be:
- Activity:
<#config-flow-v47-activity>
- Provenance Context:
<#config-flow-v47-context>
- Delegation Chain:
<#config-flow-v47-delegationChain>
- Delegation Steps:
<#config-flow-v47-delegationStep-1>
<#config-flow-v47-delegationStep-2>
Backlinks