Configuration¶
Stargazer uses LocalStorageClient as the single storage client. It always handles caching and local metadata. When PINATA_JWT is available, a PinataClient remote is attached for authenticated operations. Public IPFS gateway access is always available — downloading a public CID works out of the box with no configuration, the same way docker run ubuntu pulls from Docker Hub by default.
Summary¶
| Setup | Download | Upload / Query / Delete | Env Requirements |
|---|---|---|---|
| Default | Cache + public IPFS gateway | Local only (TinyDB) | None |
| JWT + public | Cache + public IPFS gateway | Pinata API (public network) | PINATA_JWT, PINATA_VISIBILITY=public |
| JWT + private | Cache + signed URLs | Pinata API (private network) | PINATA_JWT |
Default (no JWT)¶
Files are stored on the local filesystem under STARGAZER_LOCAL (defaults to ~/.stargazer/local). Metadata is indexed in a TinyDB database.
Downloads check the local cache first. On a cache miss, the public IPFS gateway is used to fetch the file — no credentials needed for public CIDs.
With JWT¶
When PINATA_JWT is present, a PinataClient remote is attached. This enables upload, query, and delete via the Pinata API. PINATA_VISIBILITY controls whether files are uploaded to the public or private network:
- private (default): uploads as private, downloads use signed URLs, queries hit
/files/private - public: uploads as public, downloads use the public IPFS gateway, queries hit
/files/public
Warning — ephemeral compute: Without
PINATA_JWT, uploads and metadata are stored only on the local filesystem. In ephemeral compute environments (e.g. Union/Flyte pods, CI runners, serverless functions), local storage is lost when the container exits. SetPINATA_JWTto persist outputs beyond the lifetime of the compute instance.
Environment Variables¶
All env vars are centralized in utils/config.py. If set (even to empty string), the value is used exactly. If unset, the default applies.
| Variable | Purpose | Default | Required |
|---|---|---|---|
STARGAZER_LOCAL |
Local storage directory | ~/.stargazer/local |
No |
PINATA_JWT |
Pinata API authentication | None (unset) | Only for authenticated operations |
PINATA_GATEWAY |
Public IPFS gateway URL | https://dweb.link |
No (set to empty string to disable) |
PINATA_VISIBILITY |
public or private |
private |
No |
Resolution Logic¶
- If
PINATA_JWTis set: attachPinataClientremote - If no JWT: no remote (public gateway still available for downloads)
Always returns LocalStorageClient. The remote is optional.
Download Flow¶
flowchart TD
Start([Download requested]) --> A{path exists?}
A -->|Yes| Done([Return])
A -->|No| B{CID in local cache?}
B -->|Yes| Done
B -->|No| C{local_ CID?}
C -->|Yes| D("Look up TinyDB") --> Done
C -->|No| E{Remote + private\nvisibility?}
E -->|Yes| F("Signed URL download → cache") --> Done
E -->|No| G("Public IPFS gateway → fetch + cache") --> Done
Storage Client Protocol¶
All storage operations go through LocalStorageClient:
flowchart LR
C[("LocalStorageClient")]
C --> U("upload(component)\nremote or local — never both")
C --> D("download(component)\nbool — cache → remote (private) → public gateway")
C --> Q("query(keyvalues)\nremote or local TinyDB")
C --> X("delete(component)\nremote or local")
The two modes are explicit and separate:
- JWT set (remote mode): Pinata owns metadata and bytes. TinyDB is not involved. Upload, query, and delete go to Pinata. Downloads fetch bytes by CID via signed URL or public gateway, cached locally as bytes only.
- No JWT (local mode): TinyDB owns metadata. Local filesystem stores bytes. Downloads check TinyDB, then fall back to the public IPFS gateway for cache misses on bytes.
Resource Bundles¶
Bundles are curated sets of files (reference genomes, demo datasets) defined as YAML manifests in src/stargazer/bundles/. Each manifest lists CIDs and their keyvalues, with a bundle keyvalue on each file for queryability.
Hydration Flow¶
fetch_resource_bundle(bundle_name) downloads files by CID:
flowchart TD
Start([fetch_resource_bundle]) --> A("Load YAML manifest by name")
A --> B{JWT set?}
B -->|Yes| C("Files already registered in Pinata\nDownload bytes by CID\nNo TinyDB writes")
B -->|No| D("Seed TinyDB with manifest keyvalues\nDownload bytes from public IPFS gateway")
C --> Done([Assets queryable via assemble])
D --> Done
Bundle Format¶
name: scrna_demo
description: Sample scRNA-seq mouse brain data for demo workflows
files:
- cid: QmABC...
keyvalues:
asset: anndata
bundle: scrna_demo
sample_id: s1d1
stage: raw
organism: mouse
After hydration, bundled assets are queryable via assemble() and query_files like any other asset.