API Reference¶
App¶
Marimo notebook Flyte App entrypoint.¶
Glue between the note_env AppEnvironment (declared in stargazer.config)
and the marimo notebook server. Researchers use Marimo notebooks to
explore data, run tasks, and visualize results — bridging exploratory
work and production workflows.
Local development
marimo edit src/stargazer/notebooks/byod.py
Run locally as a docker run container:
docker run -p 8080:8080 ghcr.io/stargazerbio/stargazer-note:latest
Deploy hosted to Flyte
stargazer-app
spec: docs/architecture/notebook.md
main()
¶
Deploy the Marimo notebook app to Flyte.
Source code in src/stargazer/app.py
Build Images¶
Build Stargazer's Flyte task images locally.¶
Iterates the per-task Flyte environments declared in stargazer.config —
scrna_env and gatk_env — and calls flyte.build_images() on each.
With image.builder = local in .flyte/config.yaml and no registry=
set on the images, the docker builder runs with --load and the result
stays in the local docker cache (no push, no registry credentials needed).
CI is expected to publish to the hosted registry on merge to main.
Equivalent to running flyte build src/stargazer/config.py <env> once
per env, but without the per-invocation init overhead.
The human-runnable images (note, chat) are built from the project's
Dockerfile instead — see docs/guides/contributing.md for the
docker build --target {note,chat} commands.
spec: docs/architecture/configuration.md
main()
¶
Build and push images for every Flyte task environment.
Source code in src/stargazer/build_images.py
Config¶
Centralized configuration for Stargazer.¶
Sets environment variable defaults at import time. Consumers read os.environ directly rather than importing named values from this module.
Also the source of truth for the lean per-task Flyte environments
(scrna_env, gatk_env) and the thin AppEnvironment that hosts the
Marimo notebook UI (note_env). The human-runnable images (note, chat)
are built from the project's Dockerfile — note_env consumes the
pre-built stargazer-note image as its base.
Rules: - PINATA_JWT: No default — absence means no authenticated Pinata. - PINATA_GATEWAY: Defaults to dweb.link if unset. Set to empty string to force a failure on public downloads. - PINATA_VISIBILITY: Defaults to "private" if unset. Only evaluated by PinataClient — if JWT is unset, downloads are always public. - STARGAZER_LOCAL: Local storage directory. Defaults to ~/.stargazer/local.
spec: docs/architecture/configuration.md
log_execution()
¶
Start a per-execution log sink and return the execution ID.
Derives the workflow name from the calling function, fetches the current git commit hash, and creates a dedicated logfile for this execution. Warns if the git tree has uncommitted changes.
Source code in src/stargazer/config.py
Marshal¶
Output marshaling: typed object → dict (for MCP response serialization).¶
spec: docs/architecture/mcp-server.md
marshal_output(value)
¶
Convert a typed Python object to a JSON-friendly structure for MCP transport.
Source code in src/stargazer/marshal.py
Registry¶
Task registry for auto-discovery of Flyte tasks and workflows.
Discovers all tasks from stargazer.tasks and stargazer.workflows modules, extracts parameter types, defaults, and return types for MCP catalog exposure.
spec: docs/architecture/mcp-server.md
TaskInfo
dataclass
¶
Complete metadata about a registered task.
Source code in src/stargazer/registry.py
TaskOutput
dataclass
¶
TaskParam
dataclass
¶
TaskRegistry
dataclass
¶
Discovers and provides access to all Flyte tasks and workflows.
Source code in src/stargazer/registry.py
80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 | |
__post_init__()
¶
get(name)
¶
list_tasks(category=None)
¶
List all registered tasks, optionally filtered by category.
Source code in src/stargazer/registry.py
to_catalog(category=None)
¶
Return a JSON-serializable catalog of all tasks.
Source code in src/stargazer/registry.py
Server¶
Stargazer MCP Server.¶
Exposes storage tools and a dynamic task runner via FastMCP. Tasks and workflows are auto-discovered from the registry and executed through the Flyte local run context.
Usage
stargazer # stdio transport (default) stargazer --http # streamable-http transport
spec: docs/architecture/mcp-server.md
delete_file(cid)
async
¶
download_file(cid)
async
¶
Download a file by CID to local cache. Returns the local path.
fetch_resource_bundle(bundle_name)
async
¶
Download a predefined resource bundle into local storage.
Bundles are curated sets of files (e.g. reference genomes, demo datasets) defined in the codebase. Each file is identified by CID and downloaded via the standard storage path (signed URL with JWT, or public IPFS gateway).
When PINATA_JWT is set, remote metadata is authoritative and overwrites local records. Without a JWT, the bundle manifest provides the metadata.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bundle_name
|
str
|
Name of the bundle (from list_bundles). |
required |
Returns:
| Type | Description |
|---|---|
list[dict]
|
List of fetched files with cid, keyvalues, and local path. |
Source code in src/stargazer/server.py
list_bundles()
¶
List available resource bundles.
Returns:
| Type | Description |
|---|---|
list[dict]
|
List of bundles with name, description, and file_count. |
Source code in src/stargazer/server.py
list_tasks(category=None)
¶
List available tasks and workflows with their parameter signatures.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
category
|
str | None
|
Filter by "task" or "workflow". Omit for all. |
None
|
Returns:
| Type | Description |
|---|---|
list[dict]
|
Catalog of tasks with name, category, description, params, and outputs. |
Source code in src/stargazer/server.py
main()
¶
query_files(keyvalues)
async
¶
Query files by metadata key-value pairs. Returns matching files.
run_task(task_name, filters, inputs=None)
async
¶
Run a single task by name for ad-hoc experimentation.
Use this for testing individual tools in isolation. Asset parameters are assembled from storage using the provided filters — one call to assemble() resolves all required assets. Scalar and Path parameters are passed separately via inputs.
For reproducible pipeline runs, use run_workflow instead.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
task_name
|
str
|
Name of the task (from list_tasks with category="task"). |
required |
filters
|
dict
|
Keyvalue filters for assemble() to resolve asset parameters (e.g. {"build": "GRCh38", "sample_id": "NA12878"}). |
required |
inputs
|
dict | None
|
Optional scalar/Path keyword arguments (str, int, bool, list[str]). |
None
|
Returns:
| Type | Description |
|---|---|
dict
|
Serialized task output. Single outputs returned directly, |
dict
|
multi-outputs as {"o0": ..., "o1": ...}. |
Source code in src/stargazer/server.py
run_workflow(workflow_name, inputs)
async
¶
Run a workflow by name for reproducible pipeline execution.
Workflows accept scalar parameters (str, int, bool, list[str]) and handle their own asset assembly internally. Pass inputs exactly as the workflow signature defines them — no automatic resolution is performed.
For ad-hoc experimentation with individual tools, use run_task instead.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
workflow_name
|
str
|
Name of the workflow (from list_tasks with category="workflow"). |
required |
inputs
|
dict
|
Keyword arguments as a JSON dict (scalars only). |
required |
Returns:
| Type | Description |
|---|---|
dict
|
Serialized workflow output. Single outputs returned directly, |
dict
|
multi-outputs as {"o0": ..., "o1": ...}. |
Source code in src/stargazer/server.py
show_config()
async
¶
Show current Stargazer configuration and available task counts.
Source code in src/stargazer/server.py
upload_file(path, keyvalues)
async
¶
Upload a file with metadata key-value pairs.
keyvalues must include "asset". Valid asset keys are derived from the Asset registry (e.g. asset=reference component=fasta).
When displaying results, always show a table with the CID and all keyvalues.
Source code in src/stargazer/server.py
Tasks¶
apply_bqsr task for Stargazer.¶
Applies BQSR recalibration to BAM files using GATK ApplyBQSR.
spec: docs/architecture/tasks.md
apply_bqsr(alignment, ref, bqsr_report)
async
¶
Apply Base Quality Score Recalibration to a BAM file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
alignment
|
Alignment
|
Input BAM asset |
required |
ref
|
Reference
|
Reference FASTA asset |
required |
bqsr_report
|
BQSRReport
|
Recalibration table from base_recalibrator |
required |
Returns:
| Type | Description |
|---|---|
Alignment
|
Alignment asset with recalibrated BAM file |
Source code in src/stargazer/tasks/gatk/apply_bqsr.py
ApplyVQSR task for Stargazer.¶
Applies VQSR recalibration to a VCF using GATK ApplyVQSR.
spec: docs/architecture/tasks.md
apply_vqsr(vcf, ref, vqsr_model, truth_sensitivity_filter_level=None)
async
¶
Apply VQSR recalibration to a VCF using GATK ApplyVQSR.
The recalibration mode (SNP or INDEL) is read from vqsr_model.keyvalues["mode"]. If truth_sensitivity_filter_level is not provided, defaults to 99.5 for SNP and 99.0 for INDEL.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vcf
|
Variants
|
Raw (or SNP-filtered) VCF Variants asset |
required |
ref
|
Reference
|
Reference FASTA asset |
required |
vqsr_model
|
VQSRModel
|
Recalibration model from variant_recalibrator |
required |
truth_sensitivity_filter_level
|
float | None
|
VQSLOD filter threshold (optional) |
None
|
Returns:
| Type | Description |
|---|---|
Variants
|
Variants asset with VQSR-filtered VCF |
Source code in src/stargazer/tasks/gatk/apply_vqsr.py
base_recalibrator task for Stargazer.¶
Creates BQSR recalibration table using GATK BaseRecalibrator.
spec: docs/architecture/tasks.md
base_recalibrator(alignment, ref, known_sites)
async
¶
Generate a Base Quality Score Recalibration report.
Uses GATK BaseRecalibrator to analyze patterns of covariation in the sequence dataset and produce a recalibration table.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
alignment
|
Alignment
|
Input BAM asset (should be sorted and have duplicates marked) |
required |
ref
|
Reference
|
Reference FASTA asset |
required |
known_sites
|
list[KnownSites]
|
List of KnownSites VCF assets (dbSNP, known indels, etc.) |
required |
Returns:
| Type | Description |
|---|---|
BQSRReport
|
BQSRReport asset containing the recalibration table |
Source code in src/stargazer/tasks/gatk/base_recalibrator.py
CombineGVCFs task for Stargazer.¶
Combines multiple per-sample GVCFs into a single multi-sample GVCF for joint genotyping using GATK CombineGVCFs.
spec: docs/architecture/tasks.md
combine_gvcfs(gvcfs, ref, cohort_id='cohort')
async
¶
Combine multiple per-sample GVCFs into a single multi-sample GVCF.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gvcfs
|
list[Variants]
|
List of Variants assets, each containing a GVCF from a single sample |
required |
ref
|
Reference
|
Reference FASTA asset |
required |
cohort_id
|
str
|
Identifier for the combined cohort (default: "cohort") |
'cohort'
|
Returns:
| Type | Description |
|---|---|
Variants
|
Variants asset with combined multi-sample GVCF |
Source code in src/stargazer/tasks/gatk/combine_gvcfs.py
GATK CreateSequenceDictionary task for reference genome.¶
spec: docs/architecture/tasks.md
create_sequence_dictionary(ref)
async
¶
Create a sequence dictionary (.dict file) using GATK CreateSequenceDictionary.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ref
|
Reference
|
Reference FASTA asset |
required |
Returns:
| Type | Description |
|---|---|
SequenceDict
|
SequenceDict asset containing the .dict file |
Source code in src/stargazer/tasks/gatk/create_sequence_dictionary.py
GenomicsDBImport task for Stargazer.¶
Import VCFs to GenomicsDB for efficient joint genotyping of large cohorts.
spec: docs/architecture/tasks.md
genomics_db_import(gvcfs, workspace_path, intervals=None)
async
¶
Import GVCFs to GenomicsDB workspace for scalable joint genotyping.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gvcfs
|
list[Variants]
|
List of per-sample GVCF Variants assets to import |
required |
workspace_path
|
Path
|
Path where GenomicsDB workspace will be created |
required |
intervals
|
list[str] | None
|
Genomic intervals to process (e.g., ["chr1", "chr2:100000-200000"]) |
None
|
Returns:
| Type | Description |
|---|---|
Path
|
Path to the created GenomicsDB workspace directory |
Source code in src/stargazer/tasks/gatk/genomics_db_import.py
haplotype_caller task for Stargazer.¶
Calls germline SNPs and indels via local re-assembly of haplotypes using GATK HaplotypeCaller in GVCF mode.
spec: docs/architecture/tasks.md
haplotype_caller(alignment, ref)
async
¶
Call germline variants in GVCF mode using GATK HaplotypeCaller.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
alignment
|
Alignment
|
Sorted, duplicate-marked BAM asset (BQSR-recalibrated recommended) |
required |
ref
|
Reference
|
Reference FASTA asset with sequence dictionary |
required |
Returns:
| Type | Description |
|---|---|
Variants
|
Variants asset containing the per-sample GVCF |
Source code in src/stargazer/tasks/gatk/haplotype_caller.py
joint_call_gvcfs task for Stargazer.¶
Consolidates per-sample GVCFs into a GenomicsDB datastore and performs joint genotyping in a single task, avoiding the need to persist the GenomicsDB workspace between tasks.
spec: docs/architecture/tasks.md
joint_call_gvcfs(gvcfs, ref, intervals, cohort_id='cohort')
async
¶
Consolidate GVCFs into GenomicsDB and joint-genotype in a single task.
Runs GenomicsDBImport followed immediately by GenotypeGVCFs within the same execution context, so the workspace never needs to leave the pod.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gvcfs
|
list[Variants]
|
Per-sample GVCF Variants assets from HaplotypeCaller |
required |
ref
|
Reference
|
Reference FASTA asset |
required |
intervals
|
list[str]
|
Genomic intervals to process (required by GenomicsDBImport) |
required |
cohort_id
|
str
|
Sample ID label for the output VCF (default: "cohort") |
'cohort'
|
Returns:
| Type | Description |
|---|---|
Variants
|
Joint-genotyped Variants asset (VCF) |
Source code in src/stargazer/tasks/gatk/joint_call_gvcfs.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 | |
mark_duplicates task for Stargazer.¶
Marks duplicate reads in BAM files using GATK MarkDuplicates.
spec: docs/architecture/tasks.md
mark_duplicates(alignment)
async
¶
Mark duplicate reads in a BAM file.
Uses GATK MarkDuplicates to identify and tag duplicate reads that originated from the same DNA fragment (PCR or optical duplicates). Duplicates are marked with the 0x0400 SAM flag.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
alignment
|
Alignment
|
Input BAM asset (should be coordinate sorted) |
required |
Returns:
| Type | Description |
|---|---|
Alignment
|
Alignment asset with duplicates marked |
Source code in src/stargazer/tasks/gatk/mark_duplicates.py
merge_bam_alignment task for Stargazer.¶
Merges aligned BAM with unmapped BAM using GATK MergeBamAlignment.
spec: docs/architecture/tasks.md
merge_bam_alignment(aligned_bam, unmapped_bam, ref)
async
¶
Merge alignment data from aligned BAM with data in unmapped BAM.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
aligned_bam
|
Alignment
|
Aligned BAM asset from aligner |
required |
unmapped_bam
|
Alignment
|
Original unmapped BAM asset (must be queryname sorted) |
required |
ref
|
Reference
|
Reference FASTA asset |
required |
Returns:
| Type | Description |
|---|---|
Alignment
|
Alignment asset with merged BAM file |
Source code in src/stargazer/tasks/gatk/merge_bam_alignment.py
sort_sam task for Stargazer.¶
Sorts BAM files using GATK SortSam.
spec: docs/architecture/tasks.md
sort_sam(alignment, sort_order='coordinate')
async
¶
Sort a SAM/BAM file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
alignment
|
Alignment
|
Input BAM asset to sort |
required |
sort_order
|
str
|
Sort order - one of "coordinate", "queryname", "duplicate" |
'coordinate'
|
Returns:
| Type | Description |
|---|---|
Alignment
|
Alignment asset with sorted BAM file |
Source code in src/stargazer/tasks/gatk/sort_sam.py
VariantRecalibrator task for Stargazer.¶
Builds a recalibration model for VQSR using GATK VariantRecalibrator.
spec: docs/architecture/tasks.md
variant_recalibrator(vcf, ref, resources, mode='SNP')
async
¶
Build a VQSR recalibration model using GATK VariantRecalibrator.
Each KnownSites in resources must carry the following keyvalues:
resource_name: e.g. "hapmap", "omni", "1000G", "dbsnp", "mills"
known: "true" or "false"
training: "true" or "false"
truth: "true" or "false"
prior: numeric string, e.g. "15"
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vcf
|
Variants
|
Raw genotyped VCF Variants asset |
required |
ref
|
Reference
|
Reference FASTA asset |
required |
resources
|
list[KnownSites]
|
Training/truth VCF resources for the recalibrator |
required |
mode
|
str
|
Variant type to recalibrate — "SNP" or "INDEL" |
'SNP'
|
Returns:
| Type | Description |
|---|---|
VQSRModel
|
VQSRModel asset (recal file) with tranches_path stored in keyvalues |
Source code in src/stargazer/tasks/gatk/variant_recalibrator.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 | |
BWA tasks for reference genome indexing and alignment.¶
spec: docs/architecture/tasks.md
bwa_index(ref)
async
¶
Create BWA index files for a reference genome using bwa index.
Creates the following index files: - .amb, .ann, .bwt, .pac, .sa
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ref
|
Reference
|
Reference FASTA asset |
required |
Returns:
| Type | Description |
|---|---|
list[AlignerIndex]
|
List of AlignerIndex assets, one per index file |
Source code in src/stargazer/tasks/general/bwa.py
bwa_mem(ref, r1, r2=None, read_group=None)
async
¶
Align FASTQ reads to reference genome using BWA-MEM.
Produces an unsorted BAM file that typically needs to be sorted before downstream processing (e.g., with sort_sam).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ref
|
Reference
|
Reference FASTA asset |
required |
r1
|
R1
|
R1 FASTQ read asset |
required |
r2
|
R2 | None
|
R2 FASTQ read asset (None for single-end) |
None
|
read_group
|
dict[str, str] | None
|
Optional read group override (ID, SM, LB, PL, PU) |
None
|
Returns:
| Type | Description |
|---|---|
Alignment
|
Alignment asset containing the unsorted BAM file |
Source code in src/stargazer/tasks/general/bwa.py
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 | |
BWA-MEM2 tasks for reference genome indexing and alignment.¶
spec: docs/architecture/tasks.md
bwa_mem2_index(ref)
async
¶
Create BWA-MEM2 index files for a reference genome.
Creates the following index files: - .amb, .ann, .bwt.2bit.64, .pac, .sa
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ref
|
Reference
|
Reference FASTA asset |
required |
Returns:
| Type | Description |
|---|---|
list[AlignerIndex]
|
List of AlignerIndex assets, one per index file |
Reference
Source code in src/stargazer/tasks/general/bwa_mem2.py
bwa_mem2_mem(ref, r1, r2=None, read_group=None)
async
¶
Align FASTQ reads to reference genome using BWA-MEM2.
Produces an unsorted BAM file that typically needs to be sorted before downstream processing (e.g., with sort_sam).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ref
|
Reference
|
Reference FASTA asset |
required |
r1
|
R1
|
R1 FASTQ read asset |
required |
r2
|
R2 | None
|
R2 FASTQ read asset (None for single-end) |
None
|
read_group
|
dict[str, str] | None
|
Optional read group override (ID, SM, LB, PL, PU) |
None
|
Returns:
| Type | Description |
|---|---|
Alignment
|
Alignment asset containing the unsorted BAM file |
Reference
Source code in src/stargazer/tasks/general/bwa_mem2.py
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 | |
Samtools tasks for reference genome indexing.¶
spec: docs/architecture/tasks.md
samtools_faidx(ref)
async
¶
Create a FASTA index (.fai file) using samtools faidx.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ref
|
Reference
|
Reference FASTA asset |
required |
Returns:
| Type | Description |
|---|---|
ReferenceIndex
|
ReferenceIndex asset containing the .fai file |
Source code in src/stargazer/tasks/general/samtools.py
Leiden community detection clustering for scRNA-seq data.¶
spec: docs/workflows/scrna.md
cluster(adata, resolution=0.5, key_added='leiden')
async
¶
Assign cells to clusters using the Leiden algorithm.
Requires a precomputed neighbor graph (.uns["neighbors"]). Cluster labels are stored in .obs[key_added].
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
adata
|
AnnData
|
AnnData asset with neighbor graph |
required |
resolution
|
float
|
Leiden resolution parameter (higher = more clusters) |
0.5
|
key_added
|
str
|
.obs column name to store cluster labels |
'leiden'
|
Returns:
| Type | Description |
|---|---|
AnnData
|
AnnData asset with cluster labels in .obs |
Source code in src/stargazer/tasks/scrna/cluster.py
Marker gene identification via differential expression for scRNA-seq data.¶
spec: docs/workflows/scrna.md
find_markers(adata, groupby='leiden', method='wilcoxon')
async
¶
Identify marker genes for each cluster using differential expression.
Uses raw count data from .layers["counts"] for statistical testing. Results are stored in .uns["rank_genes_groups"].
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
adata
|
AnnData
|
Clustered AnnData asset with .layers["counts"] |
required |
groupby
|
str
|
.obs column to group cells by (cluster labels) |
'leiden'
|
method
|
str
|
Statistical test method ("wilcoxon", "t-test", etc.) |
'wilcoxon'
|
Returns:
| Type | Description |
|---|---|
AnnData
|
AnnData asset with ranked marker genes in .uns["rank_genes_groups"] |
Source code in src/stargazer/tasks/scrna/find_markers.py
Normalization and log transformation for scRNA-seq data.¶
spec: docs/workflows/scrna.md
normalize(adata)
async
¶
Normalize counts and apply log1p transformation.
Stores raw counts in .layers["counts"] before normalization so they are available for downstream differential expression analysis.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
adata
|
AnnData
|
QC-filtered AnnData asset |
required |
Returns:
| Type | Description |
|---|---|
AnnData
|
Normalized and log-transformed AnnData asset |
Source code in src/stargazer/tasks/scrna/normalize.py
Quality control and cell/gene filtering for scRNA-seq data.¶
spec: docs/workflows/scrna.md
qc_filter(adata, min_genes=100, min_cells=3, max_pct_mt=20.0, batch_key='')
async
¶
Filter low-quality cells and genes from raw scRNA-seq data.
Applies standard QC filters: minimum gene/cell thresholds, mitochondrial gene percentage, and scrublet doublet detection.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
adata
|
AnnData
|
Raw AnnData asset (.h5ad) |
required |
min_genes
|
int
|
Minimum number of genes expressed per cell |
100
|
min_cells
|
int
|
Minimum number of cells a gene must be expressed in |
3
|
max_pct_mt
|
float
|
Maximum mitochondrial gene percentage allowed per cell |
20.0
|
batch_key
|
str
|
Column in .obs to use as batch for scrublet (empty = no batch) |
''
|
Returns:
| Type | Description |
|---|---|
AnnData
|
Filtered AnnData asset with QC metrics in .obs |
Source code in src/stargazer/tasks/scrna/qc_filter.py
PCA, neighbor graph, and UMAP dimensionality reduction for scRNA-seq data.¶
spec: docs/workflows/scrna.md
reduce_dimensions(adata, n_pcs=50, n_neighbors=15)
async
¶
Compute PCA, k-nearest neighbor graph, and UMAP embedding.
Operates on the highly variable gene subset to reduce noise. Embeddings are stored in .obsm for downstream clustering and visualization.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
adata
|
AnnData
|
AnnData asset with HVG annotations in .var |
required |
n_pcs
|
int
|
Number of principal components to compute |
50
|
n_neighbors
|
int
|
Number of neighbors for the kNN graph |
15
|
Returns:
| Type | Description |
|---|---|
AnnData
|
AnnData asset with PCA (.obsm["X_pca"]), neighbors, and UMAP (.obsm["X_umap"]) |
Source code in src/stargazer/tasks/scrna/reduce_dimensions.py
Highly variable gene selection for scRNA-seq data.¶
spec: docs/workflows/scrna.md
select_features(adata, n_top_genes=2000, batch_key='')
async
¶
Select highly variable genes for dimensionality reduction.
Annotates .var with highly_variable flags. Downstream tasks use only the highly variable subset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
adata
|
AnnData
|
Normalized AnnData asset |
required |
n_top_genes
|
int
|
Number of top highly variable genes to select |
2000
|
batch_key
|
str
|
Column in .obs to use as batch (empty = no batch correction) |
''
|
Returns:
| Type | Description |
|---|---|
AnnData
|
AnnData asset with HVG annotations in .var |
Source code in src/stargazer/tasks/scrna/select_features.py
Types¶
Alignment asset types for Stargazer.¶
spec: docs/architecture/types.md
Alignment
dataclass
¶
Bases: Asset
BAM/CRAM alignment file asset.
Carries reference_cid and r1_cid for provenance (PROV entity derivation).
Source code in src/stargazer/assets/alignment.py
AlignmentIndex
dataclass
¶
Bases: Asset
BAI/CRAI alignment index file asset.
Carries alignment_cid linking to the Alignment it indexes.
Source code in src/stargazer/assets/alignment.py
BQSRReport
dataclass
¶
Bases: Asset
BQSR recalibration table produced by GATK BaseRecalibrator.
Carries alignment_cid linking to the Alignment it was produced from.
Source code in src/stargazer/assets/alignment.py
Asset base dataclass for Stargazer.¶
spec: docs/architecture/types.md
Asset
dataclass
¶
Base class for all typed file assets in Stargazer.
Attributes:
| Name | Type | Description |
|---|---|---|
cid |
str
|
Content identifier (CID) for the stored file |
path |
Path | None
|
Local filesystem path (set after download or upload) |
keyvalues |
Path | None
|
Arbitrary metadata dict for base Asset instances only |
Subclasses declare typed fields as normal dataclass attributes:
@dataclass
class Alignment(Asset):
_asset_key: ClassVar[str] = "alignment"
sample_id: str = ""
duplicates_marked: bool = False
Fields are plain Python attributes. to_keyvalues() serializes them to
dict[str, str] at storage boundaries; from_keyvalues() reconstructs
from storage. str fields pass through directly; all other types use
json.dumps / json.loads.
Source code in src/stargazer/assets/asset.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 | |
__init_subclass__(**kwargs)
¶
__setattr__(name, value)
¶
Enforce declared fields on typed subclasses; pass through on base Asset.
Source code in src/stargazer/assets/asset.py
fetch()
async
¶
Download this asset and all its companions from storage.
Downloads the asset itself, then queries storage for any assets linked
via {_asset_key}_cid to auto-download companions (e.g. indices,
mate reads).
Source code in src/stargazer/assets/asset.py
from_dict(data)
classmethod
¶
Reconstruct from a serialized dict.
Source code in src/stargazer/assets/asset.py
from_keyvalues(kv, cid='', path=None)
classmethod
¶
Reconstruct from a storage keyvalues dict.
str fields are assigned directly; all other types are deserialized with json.loads. Base Asset receives keyvalues as-is.
Source code in src/stargazer/assets/asset.py
to_dict()
¶
to_keyvalues()
¶
Serialize to storage format.
str fields pass through as-is; all other types are serialized with json.dumps. Base Asset instances return their keyvalues dict directly.
Source code in src/stargazer/assets/asset.py
update(path, **kwargs)
async
¶
Upload file and set cid. Shared by all asset types.
Source code in src/stargazer/assets/asset.py
assemble(**filters)
async
¶
Query storage by keyvalue filters and return specialized assets.
The asset filter key accepts a string or list of strings to narrow
by asset type. Other filters are passed through as keyvalue matchers.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
**filters
|
Any
|
Keyvalue filters. Values may be scalars or lists (cartesian product). |
{}
|
Returns:
| Type | Description |
|---|---|
list[Asset]
|
Flat list of specialized Asset subclass instances. |
Examples:
assets = await assemble(build="GRCh38", asset="reference") ref = next(a for a in assets if isinstance(a, Reference))
assets = await assemble(sample_id="NA12878", asset=["r1", "r2"]) r1 = next(a for a in assets if isinstance(a, R1))
Source code in src/stargazer/assets/asset.py
Read file asset types for Stargazer.¶
spec: docs/architecture/types.md
R1
dataclass
¶
Bases: Asset
R1 (forward) FASTQ read file asset.
Carries mate_cid pointing to the paired R2 asset's CID (None for single-end).
Source code in src/stargazer/assets/reads.py
Reference genome asset types for Stargazer.¶
spec: docs/architecture/types.md
AlignerIndex
dataclass
¶
Bases: Asset
Aligner index file asset (one file per index file for multi-file indices).
Source code in src/stargazer/assets/reference.py
Reference
dataclass
¶
Bases: Asset
Reference FASTA file asset.
Source code in src/stargazer/assets/reference.py
contigs
property
¶
Read contig names from the companion .fai index.
Requires fetch() to have been called first so the ReferenceIndex companion is downloaded alongside this reference.
ReferenceIndex
dataclass
¶
Bases: Asset
FASTA index (.fai) file asset.
Carries reference_cid linking back to the Reference it was built from.
Source code in src/stargazer/assets/reference.py
scRNA-seq asset types for Stargazer.¶
spec: docs/workflows/scrna.md
AnnData
dataclass
¶
Bases: Asset
AnnData (.h5ad) file asset for single-cell RNA-seq data.
Tracks pipeline stage, cell/gene counts, and provenance through the scRNA-seq processing steps.
Source code in src/stargazer/assets/scrna.py
Variant call asset types for Stargazer.¶
spec: docs/architecture/types.md
KnownSites
dataclass
¶
Bases: Asset
Known variant sites VCF used for BQSR.
Standalone asset — carries build and source fields, no container needed.
Source code in src/stargazer/assets/variants.py
KnownSitesIndex
dataclass
¶
Bases: Asset
VCF index (.idx) file for a KnownSites asset.
Carries known_sites_cid linking to the KnownSites VCF it indexes. Fetched automatically alongside the VCF via Asset.fetch().
Source code in src/stargazer/assets/variants.py
VQSRModel
dataclass
¶
Bases: Asset
VQSR recalibration model (.recal file + tranches path).
Produced by VariantRecalibrator. The recal file is the primary path; the companion tranches file path is stored in keyvalues["tranches_path"].
Source code in src/stargazer/assets/variants.py
Variants
dataclass
¶
Utils¶
Local filesystem storage client for Stargazer.¶
Always the primary storage client. Stores files locally with TinyDB metadata indexing and delegates to a remote backend (PinataClient) or the public IPFS gateway for cache misses.
Also provides the module-level factory and singleton:
- get_client(): create a LocalStorageClient based on available config
- default_client: pre-built singleton used across the application
spec: docs/architecture/configuration.md
LocalStorageClient
¶
Local filesystem storage client with optional remote backend.
Always handles caching and TinyDB metadata. Downloads follow this order:
- Return if file already exists at component.path
- Check local cache by CID
- If remote backend (PinataClient) is attached, fetch via signed URL
- Fall back to public IPFS gateway
When a PinataClient remote is attached, upload/query/delete delegate to it. Without a remote, upload/query/delete operate locally only.
Usage
client = LocalStorageClient() comp = Asset(path=Path("data.bam"), keyvalues={"type": "alignment"}) await client.upload(comp) files = await client.query({"type": "alignment"}) await client.download(comp)
Source code in src/stargazer/utils/local_storage.py
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 | |
db
property
¶
Get TinyDB instance for local metadata storage (lazy initialized).
Re-opens if the DB file has been deleted or modified externally, keeping _last_id in sync when other processes write to the same file.
__init__(local_dir=None, remote=None, public_gateway=None)
¶
Initialize local storage client.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
local_dir
|
Optional[Path]
|
Local directory for file storage (defaults to STARGAZER_LOCAL) |
None
|
remote
|
Optional[PinataClient]
|
Optional PinataClient for authenticated Pinata operations |
None
|
public_gateway
|
Optional[str]
|
Public IPFS gateway URL (defaults to PINATA_GATEWAY) |
None
|
Source code in src/stargazer/utils/local_storage.py
delete(component)
async
¶
Delete a file. Delegates to remote if attached, otherwise deletes locally.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
component
|
Asset
|
Asset with cid set |
required |
Source code in src/stargazer/utils/local_storage.py
download(component, dest=None, name=None)
async
¶
Download a file by CID. Checks cache, then remote, then public gateway.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
component
|
Asset
|
Asset with cid set |
required |
dest
|
Optional[Path]
|
Optional destination path (copies file there) |
None
|
name
|
Optional[str]
|
Optional filename to use instead of the CID |
None
|
Returns:
| Type | Description |
|---|---|
bool
|
True if the file was already cached, False if freshly downloaded. |
Source code in src/stargazer/utils/local_storage.py
query(keyvalues)
async
¶
Query files by keyvalue metadata. Delegates to remote if attached.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
keyvalues
|
dict[str, str]
|
Metadata key-value pairs to filter by |
required |
Returns:
| Type | Description |
|---|---|
list[dict]
|
List of raw storage records with 'cid', 'path', and 'keyvalues' keys |
Source code in src/stargazer/utils/local_storage.py
upload(component)
async
¶
Upload a file. Delegates to remote if attached, otherwise stores locally.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
component
|
Asset
|
Asset with path and keyvalues set |
required |
Source code in src/stargazer/utils/local_storage.py
get_client()
¶
Create a storage client based on available credentials.
Always returns a LocalStorageClient. When PINATA_JWT is available, a PinataClient remote is attached for authenticated operations (upload, query, delete, private downloads). Public IPFS gateway access is always available for downloading public CIDs.
Resolution logic
- PINATA_JWT set -> LocalStorageClient + PinataClient remote
- No JWT -> LocalStorageClient (public gateway only)
Returns:
| Type | Description |
|---|---|
LocalStorageClient
|
A LocalStorageClient, optionally with a PinataClient remote |
Source code in src/stargazer/utils/local_storage.py
Pinata API v3 client for IPFS file storage.¶
Provides async interface for authenticated Pinata operations: - Uploading files with keyvalue metadata - Downloading private files via signed gateway URLs - Querying files by keyvalue pairs - Deleting files
Used as a remote backend by LocalStorageClient when PINATA_JWT is available.
spec: docs/architecture/configuration.md
PinataClient
¶
Async client for Pinata API v3.
Handles authenticated operations against the Pinata API: uploads, private downloads via signed URLs, metadata queries, and deletions.
This is a pure remote transport — caching is handled by LocalStorageClient.
PINATA_VISIBILITY controls upload network and query/download behavior: - "private": uploads as private, downloads via signed URLs, queries /files/private - "public": uploads as public, downloads via public gateway (handled by LocalStorageClient), queries /files/public
If JWT is unset, only public downloads are possible (via LocalStorageClient's public gateway fallback).
Usage
client = PinataClient() comp = Asset(path=Path("data.bam"), keyvalues={"type": "alignment"}) await client.upload(comp) # sets comp.cid files = await client.query({"type": "alignment", "sample": "NA12878"}) await client.delete(comp)
Source code in src/stargazer/utils/pinata.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 | |
jwt
property
¶
Get JWT token, raising error if not set.
__init__(jwt=None, visibility=None)
¶
Initialize Pinata client.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
jwt
|
Optional[str]
|
Pinata JWT token (defaults to PINATA_JWT from config) |
None
|
visibility
|
Optional[str]
|
"public" or "private" (defaults to PINATA_VISIBILITY from config) |
None
|
Source code in src/stargazer/utils/pinata.py
delete(component)
async
¶
Delete a file from Pinata by querying for its internal ID first.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
component
|
Asset
|
Asset with cid set |
required |
Source code in src/stargazer/utils/pinata.py
download_to(cid, dest)
async
¶
Download a file to dest. Uses signed URL for private, raises for public.
Public downloads are handled by LocalStorageClient's public gateway fallback, so this method is only called for private visibility.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cid
|
str
|
Content identifier |
required |
dest
|
Path
|
Destination path to write to |
required |
Source code in src/stargazer/utils/pinata.py
query(keyvalues)
async
¶
Query files by keyvalue metadata from Pinata API.
Paginates through all results automatically. Queries the private or public file endpoint based on visibility.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
keyvalues
|
dict[str, str]
|
Metadata key-value pairs to filter by |
required |
Returns:
| Type | Description |
|---|---|
list[dict]
|
List of matching Asset objects |
Source code in src/stargazer/utils/pinata.py
upload(component)
async
¶
Upload a file to IPFS via Pinata. Sets component.cid.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
component
|
Asset
|
Asset with path and keyvalues set |
required |
Source code in src/stargazer/utils/pinata.py
Query generation utilities for Stargazer.¶
Utilities for generating metadata queries, including support for cartesian product queries across multiple dimensions.
spec: docs/architecture/types.md
generate_query_combinations(base_query, filters)
¶
Generate query combinations from filters using cartesian product.
Takes a base query dict and filters dict, where filters can contain scalar values or lists. For any list-valued filter, generates all combinations using cartesian product, while preserving scalar filters and the base query in all combinations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
base_query
|
Dict[str, Any]
|
Base query dict to include in all combinations |
required |
filters
|
Dict[str, Any]
|
Filter dict with scalar or list values |
required |
Returns:
| Type | Description |
|---|---|
List[Dict[str, Any]]
|
List of query dicts representing all combinations |
Example
base = {"type": "reference"} filters = {"build": "GRCh38", "tool": ["fasta", "bwa"]} generate_query_combinations(base, filters) [ {"type": "reference", "build": "GRCh38", "tool": "fasta"}, {"type": "reference", "build": "GRCh38", "tool": "bwa"} ]
base = {"type": "reference"} filters = {"build": ["GRCh38", "GRCh37"], "tool": ["fasta", "bwa"]} generate_query_combinations(base, filters) [ {"type": "reference", "build": "GRCh38", "tool": "fasta"}, {"type": "reference", "build": "GRCh38", "tool": "bwa"}, {"type": "reference", "build": "GRCh37", "tool": "fasta"}, {"type": "reference", "build": "GRCh37", "tool": "bwa"} ]
Source code in src/stargazer/utils/query.py
Workflows¶
GATK Best Practices: Data Pre-processing for Variant Discovery¶
Implements: 1. Reference preparation — FASTA index, sequence dictionary, BWA index 2. Sample preprocessing — align, sort, mark duplicates, BQSR
spec: docs/architecture/workflows.md
prepare_reference(build)
async
¶
Prepare reference genome for alignment and variant calling.
Assembles the reference FASTA from storage and creates necessary indices: 1. FASTA index (samtools faidx) 2. BWA index (bwa index)
All indices are uploaded to storage as side-effects.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
build
|
str
|
Reference genome build identifier (e.g. "GRCh38") |
required |
Returns:
| Type | Description |
|---|---|
Reference
|
Reference asset (FASTA file) |
Source code in src/stargazer/workflows/gatk_data_preprocessing.py
preprocess_sample(build, sample_id)
async
¶
Pre-process a single sample's reads for variant calling.
Assembles reference and reads from storage, then runs: 1. BWA-MEM alignment 2. Coordinate sort (GATK SortSam) 3. Mark duplicates (GATK MarkDuplicates)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
build
|
str
|
Reference genome build identifier |
required |
sample_id
|
str
|
Sample identifier used to query reads |
required |
Returns:
| Type | Description |
|---|---|
Alignment
|
Alignment asset with the preprocessed BAM file |
Source code in src/stargazer/workflows/gatk_data_preprocessing.py
GATK Best Practices: Germline Short Variant Discovery (SNPs + Indels)¶
End-to-end GATK pipeline from raw reads to joint-genotyped variants
- prepare_reference — FASTA index, sequence dictionary, BWA index
- preprocess_sample — align, sort, mark duplicates (per sample, parallel)
- haplotype_caller — per-sample GVCF (parallel)
- joint_call_gvcfs — GenomicsDBImport + GenotypeGVCFs
spec: docs/architecture/workflows.md
germline_short_variant_discovery(build, sample_ids, cohort_id='cohort')
async
¶
End-to-end germline short variant discovery from raw reads.
Runs the full GATK best-practices pipeline: 1. Reference preparation (indexing) 2. Per-sample preprocessing (align, sort, mark duplicates) in parallel 3. HaplotypeCaller per sample in parallel 4. Joint genotyping (GenomicsDBImport + GenotypeGVCFs)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
build
|
str
|
Reference genome build identifier (e.g. "GRCh38") |
required |
sample_ids
|
list[str]
|
List of sample identifiers to process |
required |
cohort_id
|
str
|
Identifier for the cohort output (default: "cohort") |
'cohort'
|
Returns:
| Type | Description |
|---|---|
Variants
|
Joint-genotyped Variants asset |
Source code in src/stargazer/workflows/germline_short_variant_discovery.py
scRNA-seq clustering pipeline: QC → normalization → clustering → marker genes.¶
Implements the scanpy clustering tutorial workflow as Flyte v2 tasks. Assembles a raw AnnData from storage by sample_id, then runs the full preprocessing and clustering stack.
Prerequisites
A raw .h5ad file must be uploaded to storage with asset="anndata" and stage="raw".
spec: docs/workflows/scrna.md
scrna_clustering_pipeline(sample_id, organism='human', n_top_genes=2000, resolution=0.5, max_pct_mt=20.0)
async
¶
End-to-end scRNA-seq clustering pipeline.
Runs QC filtering, normalization, feature selection, dimensionality reduction, Leiden clustering, and marker gene identification in sequence.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sample_id
|
str
|
Sample identifier used to look up the raw AnnData in storage |
required |
organism
|
str
|
Organism name (e.g. "human", "mouse") |
'human'
|
n_top_genes
|
int
|
Number of highly variable genes to select |
2000
|
resolution
|
float
|
Leiden clustering resolution (higher = more clusters) |
0.5
|
max_pct_mt
|
float
|
Maximum mitochondrial gene percentage per cell |
20.0
|
Returns:
| Type | Description |
|---|---|
AnnData
|
Annotated AnnData asset with cluster labels and ranked marker genes |