Configuration¶
Stargazer uses LocalStorageClient as the single storage client. It always handles caching and local metadata. When PINATA_JWT is available, a PinataClient remote is attached for authenticated operations. Public IPFS gateway access is always available — downloading a public CID works out of the box with no configuration, the same way docker run ubuntu pulls from Docker Hub by default.
Summary¶
| Setup | Download | Upload / Query / Delete | Env Requirements |
|---|---|---|---|
| Default | Cache + public IPFS gateway | Local only (TinyDB) | None |
| JWT + public | Cache + public IPFS gateway | Pinata API (public network) | PINATA_JWT, PINATA_VISIBILITY=public |
| JWT + private | Cache + signed URLs | Pinata API (private network) | PINATA_JWT |
Default (no JWT)¶
Files are stored on the local filesystem under STARGAZER_LOCAL (defaults to ~/.stargazer/local). Metadata is indexed in a TinyDB database.
Downloads check the local cache first. On a cache miss, the public IPFS gateway is used to fetch the file — no credentials needed for public CIDs.
Note — public Pinata files require JWT to query: Downloading a public CID works without credentials (any IPFS gateway can serve it), but querying Pinata's file index — even
/files/public— requires a JWT. WithoutPINATA_JWT,assemble()searches only the local TinyDB; public files pinned to your Pinata account are invisible to it.
With JWT¶
When PINATA_JWT is present, a PinataClient remote is attached. This enables upload, query, and delete via the Pinata API. PINATA_VISIBILITY controls whether files are uploaded to the public or private network:
- private (default): uploads as private, downloads use signed URLs, queries hit
/files/private - public: uploads as public, downloads use the public IPFS gateway, queries hit
/files/public
Warning — ephemeral compute: Without
PINATA_JWT, uploads and metadata are stored only on the local filesystem. In ephemeral compute environments (e.g. Union/Flyte pods, CI runners, serverless functions), local storage is lost when the container exits. SetPINATA_JWTto persist outputs beyond the lifetime of the compute instance.
Environment Variables¶
All env vars are centralized in utils/config.py. If set (even to empty string), the value is used exactly. If unset, the default applies.
| Variable | Purpose | Default | Required |
|---|---|---|---|
STARGAZER_LOCAL |
Local storage directory | ~/.stargazer/local |
No |
PINATA_JWT |
Pinata API authentication | None (unset) | Only for authenticated operations |
PINATA_GATEWAY |
Public IPFS gateway URL | https://dweb.link |
No (set to empty string to disable) |
PINATA_VISIBILITY |
public or private |
private |
No |
Resolution Logic¶
- If
PINATA_JWTis set: attachPinataClientremote - If no JWT: no remote (public gateway still available for downloads)
Always returns LocalStorageClient. The remote is optional.
Download Flow¶
flowchart TD
Start([Download requested]) --> A{path exists?}
A -->|Yes| Done([Return])
A -->|No| B{CID in local cache?}
B -->|Yes| Done
B -->|No| C{local_ CID?}
C -->|Yes| D("Look up TinyDB") --> Done
C -->|No| E{Remote + private\nvisibility?}
E -->|Yes| F("Signed URL download → cache") --> Done
E -->|No| G("Public IPFS gateway → fetch + cache") --> Done
Storage Client Protocol¶
All storage operations go through LocalStorageClient:
flowchart LR
C[("LocalStorageClient")]
C --> U("upload(component)\nremote or local — never both")
C --> D("download(component)\nbool — cache → remote (private) → public gateway")
C --> Q("query(keyvalues)\nremote or local TinyDB")
C --> X("delete(component)\nremote or local")
The two modes are explicit and separate:
- JWT set (remote mode): Pinata owns metadata and bytes. TinyDB is not involved. Upload, query, and delete go to Pinata. Downloads fetch bytes by CID via signed URL or public gateway, cached locally as bytes only.
- No JWT (local mode): TinyDB owns metadata. Local filesystem stores bytes. Downloads check TinyDB, then fall back to the public IPFS gateway for cache misses on bytes.
Pinata upload paths¶
PinataClient.upload() size-branches. Files up to 100MB use the plain
multipart POST (one round trip, CID in the response body). Larger files use
Pinata's resumable TUS endpoint — chunked PATCHes, CID read from the
Upload-Cid header on the completing request. The split is required, not an
optimization: Pinata's plain POST is hard-capped at 100MB. The TUS path is
chunked-first (no resume yet) with a 10 GiB/file ceiling — itself a Pinata
limit, not an IPFS one (IPFS chunks files into a DAG with no inherent size
cap).
query(keyvalues, network=) queries one network when network is given,
else merges both; each record carries the network it was found on.
create_signed_upload_url(filename, keyvalues, network, ...) mints a signed
URL with the filename and keyvalues fixed at mint time — used by the admin
asset page so a browser can upload bytes directly to Pinata without the
metadata ever passing through (or being chooseable at) the upload client.
_owner attribution and the size cap are baked in there too — see
App → Asset Manager.
Container Images¶
Stargazer ships four container images on ghcr.io/stargazerbio. They split along a sharp line: task images (run only by Flyte) are declared as flyte.Image / flyte.Environment in src/stargazer/config.py; human-runnable images (used via docker run and hosted via flyte.serve) are built from the project's multi-stage Dockerfile.
| Image | Source | Type | Where it runs |
|---|---|---|---|
stargazer-scrna |
config.py (scrna_env) |
flyte.TaskEnvironment |
scRNA-seq tasks (tasks/scrna/) |
stargazer-gatk |
config.py (gatk_env) |
flyte.TaskEnvironment |
GATK + alignment tasks (tasks/gatk/, tasks/general/) |
stargazer-note |
Dockerfile (--target note) |
Marimo notebook | Local docker run exploration only |
stargazer-chat |
Dockerfile (--target chat) |
Claude Code + OpenCode + MCP | Local docker run only |
Why the split: task images need nothing but Flyte's contract (an entrypoint Flyte injects, a content-hash tag Flyte pins by) — perfectly served by the SDK. Human-runnable images need a real ENTRYPOINT, baked-in source, and a stable :latest tag — none of which the Flyte Image SDK exposes. Rather than reinvent the Dockerfile via post-build wrapping, we just use a Dockerfile.
Hosted notebook pods use a separate image, notebook-app, defined programmatically in app/per_notebook.py and built/published by the admin deploy entrypoint — it is not stargazer-note. See App → Images.
Building locally¶
Contributor builds stay on the host. Nothing is pushed to a registry — no docker login needed, no write access to ghcr.io/stargazerbio required; CI publishes on merge to main. The Flyte task images build into the local docker cache (no registry= is set on the Flyte Images, so the docker builder falls through to --load instead of --push), with the builder selected in .flyte/config.yaml — local requires a working Docker daemon, remote (Union only) builds on the cluster. The human-runnable images build from the Dockerfile and are tagged with their published URLs even when local-only, so docker resolves them from the local cache by name. Commands in Contributing → Building Images.
Adding a tool¶
When a new Flyte task wraps a new CLI tool, layer it onto the image of the TaskEnvironment it is decorated against in config.py — via with_apt_packages, with_commands, or the bioconda install block. For tools that should be available in the human-runnable note/chat images, edit the bioconda block in the Dockerfile's base stage instead. See Writing a Task.
Resource Bundles¶
Bundles are curated sets of files (reference genomes, demo datasets) defined as YAML manifests in src/stargazer/bundles/. Each manifest lists CIDs and their keyvalues, with a bundle keyvalue on each file for queryability.
Hydration Flow¶
fetch_resource_bundle(bundle_name) downloads files by CID:
flowchart TD
Start([fetch_resource_bundle]) --> A("Load YAML manifest by name")
A --> B{JWT set?}
B -->|Yes| C("Files already registered in Pinata\nDownload bytes by CID\nNo TinyDB writes")
B -->|No| D("Seed TinyDB with manifest keyvalues\nDownload bytes from public IPFS gateway")
C --> Done([Assets queryable via assemble])
D --> Done
Bundle Format¶
A manifest carries a name, a description, and a files list; each file entry is a cid plus its keyvalues (asset type, sample metadata, and the bundle tag). See the manifests in src/stargazer/bundles/ for live examples.
After hydration, bundled assets are queryable via assemble() and query_files like any other asset.