H5AD/AnnData to Loom file path mapping documentation
This document provides a comprehensive mapping between the canonical H5AD/AnnData file format used by CELLxGENE Discover and the Loom file format. The mappings follow the CELLxGENE Schema v7.1.0 specification.
/col_attrs/ (cells are columns in ASAP's genes x cells matrix)/row_attrs/ (AnnData uses /var/)/attrs/ (AnnData uses /uns/)/matrix can contain either raw count data
or already processed/normalized data, depending on what the user uploads. Additional processed matrices
are stored in /layers/. This differs from CELLxGENE's expectation where /X is
normalized and /raw/X contains raw counts. See the Matrix Layers
section for details.
| H5AD Path | Purpose | Loom Path | Notes |
|---|---|---|---|
/X |
Primary matrix | /matrix |
Shape: (n_cells, n_genes). In ASAP, this can be either raw counts or normalized data depending on user input. |
/raw.X or /layers/{name} |
Additional matrices | /layers/{name} |
In ASAP, additional processed/normalized matrices are stored in /layers/. Same shape as /matrix. |
/obs/{field} |
Cell metadata columns | /col_attrs/{field} |
1-D array, length n_cells. In ASAP: cells are columns. |
/var/{field} |
Gene metadata columns | /row_attrs/{field} |
1-D array, length n_genes. In ASAP: genes are rows. |
/raw.var/{field} |
Raw gene metadata | /col_attrs/{field} |
Duplicate or use layer attrs |
/obsm/{key} |
Embeddings (UMAP, tSNE, etc.) | /col_attrs/{key} |
2-D arrays (n_cells, m) |
/obsp/{key} |
Pairwise cell matrices | /pairwise/{key} |
No direct support |
/varm/{key} |
Multi-dim gene annotations | /col_attrs/{key} |
2-D arrays (n_genes, m) |
/varp/{key} |
Pairwise gene matrices | /pairwise/{key} |
No direct support |
/uns/{key} |
Dataset-level metadata | /attrs/{key} |
Flatten nested dicts |
/obs/index |
Cell identifiers | /col_attrs/CellID |
Unique identifiers. In ASAP: cells are columns. |
/var/index |
Gene identifiers | /row_attrs/_StableID |
ENSEMBL IDs (no version). In ASAP: genes are rows. |
All cell-level metadata from /obs/ maps to /col_attrs/ in ASAP Loom files (because ASAP uses cells as columns).
|| (space-pipe-pipe-space). These are marked with MULTI-VALUE.
For example: "MONDO:0005015 || MONDO:0004975" for multiple diseases.
Fields marked with MULTI-VALUE * are ASAP extensions to fields that are
single-value in the standard CELLxGENE schema. See the ASAP Schema Extensions
section for details.
| H5AD Path | Loom Path | Type | Description |
|---|---|---|---|
/obs/index |
/col_attrs/CellID |
str | REQUIRED. Unique cell identifiers |
/obs/assay_ontology_term_id |
/col_attrs/assay_ontology_term_id |
categorical str | REQUIRED. EFO assay term |
/obs/cell_type_ontology_term_id |
/col_attrs/cell_type_ontology_term_id |
categorical str | REQUIRED (if not pre-analysis). CL term or "unknown" MULTI-VALUE * |
/obs/development_stage_ontology_term_id |
/col_attrs/development_stage_ontology_term_id |
categorical str | REQUIRED. Development stage term |
/obs/disease_ontology_term_id |
/col_attrs/disease_ontology_term_id |
categorical str | REQUIRED. MONDO term(s) or "PATO:0000461" MULTI-VALUE Multiple terms separated by " || " (e.g., "MONDO:0005015 || MONDO:0004975") |
/obs/donor_id |
/col_attrs/donor_id |
categorical str | REQUIRED. Unique donor identifier |
/obs/is_primary_data |
/col_attrs/is_primary_data |
bool | REQUIRED. True if canonical instance |
/obs/self_reported_ethnicity_ontology_term_id |
/col_attrs/self_reported_ethnicity_ontology_term_id |
categorical str | REQUIRED. HANCESTRO term(s) MULTI-VALUE Multiple terms separated by " || " (e.g., "HANCESTRO:0005 || HANCESTRO:0014") |
/obs/sex_ontology_term_id |
/col_attrs/sex_ontology_term_id |
categorical str | REQUIRED. PATO term |
/obs/suspension_type |
/col_attrs/suspension_type |
categorical str | REQUIRED. "cell", "nucleus", or "na" |
/obs/tissue_ontology_term_id |
/col_attrs/tissue_ontology_term_id |
categorical str | REQUIRED. UBERON or Cellosaurus term MULTI-VALUE * |
/obs/tissue_type |
/col_attrs/tissue_type |
categorical str | REQUIRED. "tissue", "organoid", "cell line", or "primary cell culture" |
* ASAP extension: these fields are single-value in the standard CELLxGENE schema but support multiple ontology terms in ASAP. See ASAP Schema Extensions for details.
*_ontology_term_id fields. For example,
assay contains the name for the term ID in assay_ontology_term_id,
cell_type contains the name for cell_type_ontology_term_id, etc.
| H5AD Path | Loom Path | Source Field | Description |
|---|---|---|---|
/obs/assay |
/col_attrs/assay |
assay_ontology_term_id |
Ontology term name for the assay |
/obs/cell_type |
/col_attrs/cell_type |
cell_type_ontology_term_id |
Ontology term name(s) for the cell type MULTI-VALUE * |
/obs/development_stage |
/col_attrs/development_stage |
development_stage_ontology_term_id |
Ontology term name for the development stage |
/obs/disease |
/col_attrs/disease |
disease_ontology_term_id |
Ontology term name(s) for the disease MULTI-VALUE |
/obs/self_reported_ethnicity |
/col_attrs/self_reported_ethnicity |
self_reported_ethnicity_ontology_term_id |
Ontology term name(s) for ethnicity MULTI-VALUE |
/obs/sex |
/col_attrs/sex |
sex_ontology_term_id |
Ontology term name for sex |
/obs/tissue |
/col_attrs/tissue |
tissue_ontology_term_id |
Ontology term name(s) for the tissue MULTI-VALUE * |
/obs/observation_joinid |
/col_attrs/observation_joinid |
- | Unique observation identifier (not from ontology) |
* ASAP extension: these fields are single-value in the standard CELLxGENE schema but support multiple ontology term names in ASAP. See ASAP Schema Extensions for details.
| H5AD Path | Loom Path | Condition |
|---|---|---|
/obs/array_col |
/col_attrs/array_col |
Visium with /uns/spatial/is_single=True |
/obs/array_row |
/col_attrs/array_row |
Visium with /uns/spatial/is_single=True |
/obs/in_tissue |
/col_attrs/in_tissue |
Visium with /uns/spatial/is_single=True |
| H5AD Path | Loom Path | Condition |
|---|---|---|
/obs/genetic_perturbation_id |
/col_attrs/genetic_perturbation_id |
If /uns/genetic_perturbations present |
/obs/genetic_perturbation_strategy |
/col_attrs/genetic_perturbation_strategy |
If /obs/genetic_perturbation_id present |
/obs/experimental_condition_ontology_term_id |
/col_attrs/experimental_condition_ontology_term_id |
Optional for perturbation experiments |
/obs/experimental_condition |
/col_attrs/experimental_condition |
Auto-generated if ontology term present |
/obs/perturbation_types |
/col_attrs/perturbation_types |
Auto-generated from perturbation data |
All gene-level metadata from /var/ maps to /row_attrs/ in ASAP Loom files.
/row_attrs/ (not /col_attrs/)/col_attrs/ (not /row_attrs/)| H5AD Path | ASAP Loom Path | Status | Description |
|---|---|---|---|
/var/index |
/row_attrs/_StableID |
REQUIRED | ENSEMBL gene IDs (no version suffix) |
/var/feature_name |
/row_attrs/Gene |
Available | Gene symbol/name |
/var/feature_biotype |
/row_attrs/_Biotypes |
Available | Gene biotype (e.g., "protein_coding", "lncRNA") |
/var/feature_length |
/row_attrs/_SumExonLength |
Available | Sum of exon lengths (bps) |
/var/feature_type |
- | CELLxGENE auto | Gene type from GENCODE/ENSEMBL. Added by CELLxGENE on upload. |
/var/feature_is_filtered |
- | REQUIRED* | REQUIRED for CELLxGENE. Must be added manually before submission. |
/var/feature_reference |
- | CELLxGENE auto | NCBITaxon term for reference organism. ASAP does not have multi-species projects. |
These fields are generated by ASAP during parsing and have no equivalent in H5AD files:
| Loom Path | Description |
|---|---|
/row_attrs/_StableID |
ENSEMBL gene identifiers (stable IDs, no version suffix). Used as the gene index in ASAP. |
/row_attrs/Accession |
Alternative gene accession identifiers |
/row_attrs/Gene |
Gene symbols/names |
/row_attrs/_Biotypes |
Gene biotype (e.g., "protein_coding", "lncRNA", "miRNA") |
/row_attrs/_SumExonLength |
Sum of exon lengths in base pairs |
/row_attrs/_Sum |
Total expression per gene (QC metric) |
feature_is_filtered: Boolean indicating if gene is filtered from normalized matrix. Must be added manually.feature_reference: NCBITaxon term for reference organism. Auto-generated by CELLxGENE on upload (ASAP does not support multi-species projects).feature_type: Gene type from GENCODE/ENSEMBL. Auto-generated by CELLxGENE on upload.Embeddings from /obsm/ are stored as 2-D arrays in /col_attrs/ in ASAP (since cells are columns).
| H5AD Path | Loom Path | Shape | Notes |
|---|---|---|---|
/obsm/spatial |
/col_attrs/spatial |
(n_cells, 2+) | Required for Visium/Slide-seq when /uns/spatial/is_single=True |
/obsm/X_{suffix} |
/col_attrs/X_{suffix} |
(n_cells, 2+) | UMAP, tSNE, PCA embeddings. {suffix} cannot be "spatial" |
Dataset-level metadata from /uns/ maps to /attrs/ in ASAP Loom files.
| H5AD Path | Loom Path | Annotator | Description |
|---|---|---|---|
/uns/title |
/attrs/title |
Curator | Dataset title |
/uns/organism_ontology_term_id |
/attrs/organism_ontology_term_id |
Curator | NCBITaxon term for organism |
/uns/schema_version |
/attrs/schema_version |
CELLxGENE | Must be "7.1.0" |
/uns/schema_reference |
/attrs/schema_reference |
CELLxGENE | URL to schema document |
/uns/organism |
/attrs/organism |
CELLxGENE | Human-readable organism name |
/uns/citation |
/attrs/citation |
CELLxGENE | Citation string |
/uns/is_pre_analysis |
/attrs/is_pre_analysis |
CELLxGENE | True for pre-analysis collections |
| H5AD Path | Loom Path | Description |
|---|---|---|
/uns/batch_condition |
/attrs/batch_condition |
JSON array of batch keys |
/uns/default_embedding |
/attrs/default_embedding |
Key of default embedding to display |
/uns/X_approximate_distribution |
/attrs/X_approximate_distribution |
"count" or "normal" |
/uns/{column}_colors |
/attrs/{column}_colors |
Color palette array (hex or named colors) |
Nested spatial metadata must be flattened using slash notation:
| H5AD Path | Loom Path |
|---|---|
/uns/spatial/is_single |
/attrs/spatial/is_single |
/uns/spatial/{library_id}/images/hires |
/attrs/spatial/{library_id}/images/hires (as base64 or separate file) |
/uns/spatial/{library_id}/scalefactors/spot_diameter_fullres |
/attrs/spatial/{library_id}/scalefactors/spot_diameter_fullres |
/uns/spatial/{library_id}/scalefactors/tissue_hires_scalef |
/attrs/spatial/{library_id}/scalefactors/tissue_hires_scalef |
Complex perturbation metadata should be stored as JSON:
| H5AD Path | Loom Path | Format |
|---|---|---|
/uns/genetic_perturbations |
/attrs/genetic_perturbations |
JSON string |
/uns/genetic_perturbations/{id}/role |
/attrs/genetic_perturbations/{id}/role |
Or flatten keys |
/uns/genetic_perturbations/{id}/protospacer_sequence |
/attrs/genetic_perturbations/{id}/protospacer_sequence |
DNA sequence (A/C/G/T) |
/matrix contains the primary data matrix
as provided by the user. This can be either raw counts or already normalized/processed data,
depending on what the user uploads. Additional processed matrices (e.g., normalized, scaled, log-transformed)
are stored in /layers/.
| H5AD Path | CELLxGENE Loom | ASAP Loom | Purpose |
|---|---|---|---|
/X |
/matrix |
/matrix |
Primary matrix. In ASAP: contains raw counts OR normalized data depending on user input. |
/raw/X |
/layers/raw |
N/A - see note | CELLxGENE expects raw counts here. ASAP does not create /layers/raw. |
/layers/{name} |
/layers/{name} |
/layers/{name} |
Additional processed matrices (normalized, scaled, log-transformed, etc.) |
/matrix = normalized data, /layers/raw = raw counts./matrix contains what the user uploads (typically raw counts). /layers/raw does NOT exist in ASAP./matrix./matrix (with no separate raw layer)./matrix already contains raw counts, no additional layer is needed. If /matrix contains normalized data, raw counts should be added to /layers/raw before submission (??)| Feature | Challenge | Workaround |
|---|---|---|
| /obsp/ and /varp/ (pairwise matrices) | No native Loom support | Store in /pairwise/{key} or as external .npz file referenced by global attribute |
| Hierarchical /uns/ dictionaries | Loom attributes are flat | Flatten with slash keys (e.g., /attrs/spatial/is_single) or store as JSON string |
| pandas CategoricalDtype | Loom stores raw arrays only | Store {field}_categories and optionally {field}_category_codes arrays |
| Mixed dtypes (strings + NaN) | Loom requires homogeneous dtypes | Cast to string, use sentinel values ("na", "unknown") |
| /raw/var/ distinct from /var/ | Loom has no second gene table | Duplicate columns in /col_attrs/ and document in global attribute |
| Neighbor graphs (/uns/neighbors) | No canonical storage | Store as /col_attrs/neighbors_indices + /col_attrs/neighbors_distances |
ASAP extends the standard CELLxGENE schema in specific ways to better support annotation workflows. These extensions are backward-compatible: fields that follow the standard schema remain valid. The extensions are documented here so that downstream tools processing ASAP Loom files are aware of them.
In the standard CELLxGENE schema, the cell_type_ontology_term_id and tissue_ontology_term_id
fields accept a single ontology term per cell. ASAP extends these fields to support multiple ontology terms
for cases where a cell is associated with more than one cell type or tissue annotation.
| Field Pair | Loom Paths | Standard Schema | ASAP Extension |
|---|---|---|---|
| Cell Type | /col_attrs/cell_type_ontology_term_id/col_attrs/cell_type |
Single CL term (e.g., CL:0000540) |
Multiple CL terms separated by || (e.g., CL:0000540 || CL:0000127) |
| Tissue | /col_attrs/tissue_ontology_term_id/col_attrs/tissue |
Single UBERON term (e.g., UBERON:0002371) |
Multiple UBERON terms separated by || (e.g., UBERON:0002371 || UBERON:0001264) |
|| (space-pipe-pipe-space) as the
separator, consistent with the standard CELLxGENE convention used for fields like
disease_ontology_term_id and self_reported_ethnicity_ontology_term_id.
The corresponding name fields (cell_type, tissue) also contain multiple names
separated by || , in the same order as their ontology term IDs.
The CELLxGENE schema defines paired cell metadata fields: an ontology term identifier field
(e.g., cell_type_ontology_term_id) and its corresponding ontology term name field
(e.g., cell_type). In ASAP, users can populate either side of the pair from existing metadata,
and the other side is generated automatically:
CL:0000540),
ASAP resolves each identifier against its ontology database and automatically creates the corresponding
name field (e.g., neuron).neuron),
ASAP looks up the matching ontology term and automatically creates the corresponding
identifier field (e.g., CL:0000540).Values that cannot be resolved against the ontology (mismatches, typos, or terms not present in the ontology version used by ASAP) are flagged as unresolved and can be corrected manually through the compliance fix interface.
CELLxGENE 7.1.0 Pinned Versions (for reference only, not enforced by ASAP):
Note: See your ASAP version's documentation for the actual ontology versions in use.