Patterns
Development Patterns
This document captures recurring architectural and code patterns used throughout the Semantic Flow project.
Architectural Patterns
Comunica SPARQL vs Quadstore Primitives for data access
Both can win. Pick by query shape.
Use Quadstore primitives (get, getStream, match) when:
- One or two triple patterns with fixed IRIs/literals.
- You can drive lookups by known keys and stop early.
- You need strict control over streaming, batching, or a read-modify-write cycle.
- You want zero SPARQL parse/plan overhead.
Use SPARQL via Comunica when:
- Three or more patterns with joins, OPTIONAL/UNION, FILTER, ORDER BY, GROUP BY, LIMIT.
- You’d benefit from join reordering, filter/projection pushdown, and early result streaming.
- You might federate later or swap sources without rewriting app code.
Why primitives can be faster on “simple”:
- Direct index hits with no planner cost.
- Tight loops with
for await...ofand immediate early-exit. - You can pre-narrow with exact keys or prefix scans and avoid any join at all.
A practical split for internal data access:
- “Path to one thing” lookups (by IRI, by type, by id): primitives.
- Graph navigation with 3+ hops or any aggregation/sorting: SPARQL.
Hybrid patterns that work well:
- Use primitives to fetch candidate IRIs, then pass them into a SPARQL
VALUESclause. - Pre-materialize small “views” (denormalized quads) you hit often, then query them with SPARQL.
- Keep SPARQL templates for common shapes; fall back to primitives for hot key-lookups.
Implementation notes:
- Consume streams with
for await (const q of stream); await completion at the boundary withstream/promisesfinished()orpipeline(). - Reuse a single Comunica engine instance to amortize init cost.
- With Quadstore, structure data so frequent lookups align with available index permutations; primitives shine when you can select by the leading fields.
Rule of thumb:
- Simple, key-oriented, latency-sensitive ⇒ primitives.
- Anything with joins/options/ordering/aggregation ⇒ SPARQL.
Stream Patterns
Use async/await for boundaries (start/finish), and use async iteration for the stream body.
-
Promises: use
awaitfor file I/O (fs/promises), HTTP fetches, initialization, and “collect-all” helpers that intentionally materialize results. -
Streaming RDF (RDF/JS, Comunica, rdf-parse/serialize, rdf-ext):
-
Prefer async iterators:
// quadStream implements AsyncIterable<Quad> for await (const quad of quadStream) { // process quad }This gives proper backpressure. Do not
.on('data', ...)andawaitinside the handler. -
If a sink uses RDF/JS
Sink#import(source), await completion with Node’s stream utilities:import { finished } from 'stream/promises'; const writer = serializer.import(quadStream); // returns a Node stream await finished(writer); // resolves on 'finish' or rejects on error -
For stream pipelines, use
pipeline:import { pipeline } from 'stream/promises'; await pipeline(sourceStream, transformA, transformB, destStream); -
Comunica result streams (bindings/quad streams) also support async iteration:
const { data } = await engine.query('CONSTRUCT {...}', { sources }); for await (const quad of data) { /* ... */ }
-
-
Collecting small results only:
import arrayifyStream from 'arrayify-stream'; const quads = await arrayifyStream(quadStream); // OK for small datasetsAvoid this for large data.
-
Writing to stores:
// RDF/JS store that exposes import() const importing = store.import(quadStream); await finished(importing);
Rule of thumb to include:
Use await for Promises and stream completion. Use for await...of to consume streaming RDF. Avoid await inside 'data' listeners and avoid buffering everything unless you explicitly need it.
Error Handling and Logging System Patterns
see Logging and Error Handling
Backlinks