nf-core/yascp: Output
Introduction
This documentation provides a comprehensive overview of the outputs generated by the pipeline.
The pipeline will create the following files in your working directory:
work # Nextflow working directory
results # Final results (configurable, see below)
.nextflow_log # Log file from Nextflow
# Other Nextflow hidden files, eg. history of pipeline runs and old logs.
The structure of the results folder is outlined below, providing a snapshot of the diverse outputs from various stages of the pipeline:
- preprocessing: Contains data split by modality and preprocessing outputs
- doublet_detection: Includes results identifying artificial doublet cells.
- deconvolution: Contains outputs from disentangling mixtures of cells from different donors.
- celltype_assignemt: Houses results of cell classification into specific types.
- clustering_and_integration: Contains clustered cells grouped by similarities and integrated datasets.
- citeseq: Includes CITE-seq data, linking transcriptomic and proteomic profiles.
- handover: Contains the final processed results.
- pipeline_info: Stores metadata and logs detailing the pipeline execution process.
- yascp_inputs:
Detailed explanations of each output folder and their corresponding steps are provided below:
preprocessing
The preprocessing
folder contains the following subdirectories:
-
data_modalities_split
This folder contains raw and filtered quantification data separated by modality (CITE-seq or hashtag).
-
resources
This folder contains genotype assembly and input files for the test dataset.
-
subset_genotypes
This folder is created only if VCF IDs are specified in the input TSV file. It contains genotypes divided from the pools.
Click to view detailed folder structure
```
preprocessing/
├── data_modalities_split
│ ├── filterd
│ │ └── Pool1
│ │ ├── Gene_Expression-Pool1.h5ad
│ │ └── Pool1__Gene_Expression
│ └── raw
│ └── Pool1
│ ├── Gene_Expression-Pool1.h5ad
│ └── Pool1__Gene_Expression
├── recourses
│ ├── Done.tmp
│ ├── full_test_dataset
│ ├── input_test_data_file.tsv
│ └── input_test_vcf_file.tsv
└── subset_genotypes
├── Genotype___AllExpectedGT_Pool1
└── Genotypes_all_pools.tsv
```
doublet_detection
The doublet_detection
folder contains the following subdirectories:
-
DoubletFinder, DoubletDecon, scDblFinder, SCDS
Each folder contains TSV files for each pool. These files include barcodes and labels indicating whether each cell is a singlet or a doublet.
-
scrublet
This folder contains TSV files for each pool with barcodes and labels indicating whether each cell is a multiplet or not. Additionally, this folder includes a subdirectory with plots.
-
doublet_results_combined
This folder contains TSV files for each pool with barcodes and combined labels from multiple tools: Scrublet, SCDS, scDblFinder, DoubletDecon, and DoubletFinder.
-
droplet_type_distribution
This folder contains PNG files with graphs visualizing the distribution of droplet types.
Click to view detailed folder structure
```
doublet_detection
├── DoubletDecon
│ └── Pool1__DoubletDecon_doublets_singlets.tsv
├── DoubletFinder
│ └── Pool1__DoubletFinder_doublets_singlets.tsv
├── doublet_results_combined
│ └── Pool1__doublet_results_combined.tsv
├── droplet_type_distribution
│ └── Pool1__droplet_type_distribution.png
├── scDblFinder
│ └── Pool1__scDblFinder_doublets_singlets.tsv
├── SCDS
│ └── Pool1__scds_doublets_singlets.tsv
└── scrublet
├── plots
│ ├── Pool1boxplot_total_umi_counts.png
│ ├── Pool1histogram_multiplet_scores_log.png
│ ├── Pool1histogram_multiplet_scores.png
│ └── Pool1histogram_multiplet_zscores.png
└── Pool1scrublet.tsv
```
deconvolution
The deconvolution
folder contains the following subdirectories:
- vireo_raw
This folder contains genotypes if run in genotype-aware mode
Click to view detailed folder structure
```
vireo_raw
├── correlations.png
├── Pool1
│ ├── dubs_removed__Study_Merge_AllExpectedGT_QW4IKXM1N_out.vcf.gz
│ ├── dubs_removed__Study_Merge_AllExpectedGT_QW4IKXM1N_out.vcf.gz.csi
│ ├── sub_Pool1_Expected.vcf.gz
│ └── vireo_Pool1
├── donor_corelations_matrix.tsv
└── matched_donors.txt
```
- vireo_processed
The vireo_processed
folder contains genotypes renamed to imitate genotype absent mode to ensure consistency in the downstream pipeline tasks
Click to view detailed folder structure
```
vireo_processed
├── assignments_all_pools.tsv
└── Pool1
├── GT_replace_donor_ids_false.tsv
├── GT_replace_GT_donors.vireo_false.vcf.gz
├── GT_replace_Pool1_assignments_false.tsv
├── GT_replace_Pool1__exp.sample_summary_false.txt
└── GT_replace_Pool1.sample_summary_false.txt
```
- vireo_sub
The vireo_sub
folder contains vireo permutations to ensure stability in cell assignment
Click to view detailed folder structure
```
vireo_sub
└── Pool1
├── vireo_____1
│ ├── dubs_removed__Study_Merge_AllExpectedGT_QW4IKXM1N_out.vcf.gz
│ ├── dubs_removed__Study_Merge_AllExpectedGT_QW4IKXM1N_out.vcf.gz.csi
│ ├── sub_Pool1_Expected.vcf.gz
│ └── vireo_Pool1___1
├── vireo_____10
│ ├── dubs_removed__Study_Merge_AllExpectedGT_QW4IKXM1N_out.vcf.gz
│ ├── dubs_removed__Study_Merge_AllExpectedGT_QW4IKXM1N_out.vcf.gz.csi
│ ├── sub_Pool1_Expected.vcf.gz
│ └── vireo_Pool1___10
├── vireo_____2
│ ├── dubs_removed__Study_Merge_AllExpectedGT_QW4IKXM1N_out.vcf.gz
│ ├── dubs_removed__Study_Merge_AllExpectedGT_QW4IKXM1N_out.vcf.gz.csi
│ ├── sub_Pool1_Expected.vcf.gz
│ └── vireo_Pool1___2
├── vireo_____3
│ ├── dubs_removed__Study_Merge_AllExpectedGT_QW4IKXM1N_out.vcf.gz
│ ├── dubs_removed__Study_Merge_AllExpectedGT_QW4IKXM1N_out.vcf.gz.csi
│ ├── sub_Pool1_Expected.vcf.gz
│ └── vireo_Pool1___3
├── vireo_____4
│ ├── dubs_removed__Study_Merge_AllExpectedGT_QW4IKXM1N_out.vcf.gz
│ ├── dubs_removed__Study_Merge_AllExpectedGT_QW4IKXM1N_out.vcf.gz.csi
│ ├── sub_Pool1_Expected.vcf.gz
│ └── vireo_Pool1___4
├── vireo_____5
│ ├── dubs_removed__Study_Merge_AllExpectedGT_QW4IKXM1N_out.vcf.gz
│ ├── dubs_removed__Study_Merge_AllExpectedGT_QW4IKXM1N_out.vcf.gz.csi
│ ├── sub_Pool1_Expected.vcf.gz
│ └── vireo_Pool1___5
├── vireo_____6
│ ├── dubs_removed__Study_Merge_AllExpectedGT_QW4IKXM1N_out.vcf.gz
│ ├── dubs_removed__Study_Merge_AllExpectedGT_QW4IKXM1N_out.vcf.gz.csi
│ ├── sub_Pool1_Expected.vcf.gz
│ └── vireo_Pool1___6
├── vireo_____7
│ ├── dubs_removed__Study_Merge_AllExpectedGT_QW4IKXM1N_out.vcf.gz
│ ├── dubs_removed__Study_Merge_AllExpectedGT_QW4IKXM1N_out.vcf.gz.csi
│ ├── sub_Pool1_Expected.vcf.gz
│ └── vireo_Pool1___7
├── vireo_____8
│ ├── dubs_removed__Study_Merge_AllExpectedGT_QW4IKXM1N_out.vcf.gz
│ ├── dubs_removed__Study_Merge_AllExpectedGT_QW4IKXM1N_out.vcf.gz.csi
│ ├── sub_Pool1_Expected.vcf.gz
│ └── vireo_Pool1___8
└── vireo_____9
├── dubs_removed__Study_Merge_AllExpectedGT_QW4IKXM1N_out.vcf.gz
├── dubs_removed__Study_Merge_AllExpectedGT_QW4IKXM1N_out.vcf.gz.csi
├── sub_Pool1_Expected.vcf.gz
└── vireo_Pool1___9
```
- infered_genotypes
The infered_genotypes
folder contains genotypes called from single-cell data (per donor in a pool)
Click to view detailed folder structure
```
infered_genotypes
└── Pool1
├── Pool1_headfix_vireo.vcf.gz
└── Pool1_headfix_vireo.vcf.gz.tbi
```
- split_donor_h5ad
The split_donor_h5ad
folder contains per donor quantification matrix and additional metadata
Click to view detailed folder structure
```
split_donor_h5ad
└── Pool1
├── cell_belongings.tsv
├── donor_level_anndata
│ ├── donor0.Pool1.barcodes.tsv
│ ├── donor0.Pool1.h5ad
│ ├── donor1.Pool1.barcodes.tsv
│ ├── donor1.Pool1.h5ad
│ ├── donor2.Pool1.barcodes.tsv
│ ├── donor2.Pool1.h5ad
│ ├── doublet.Pool1.barcodes.tsv
│ ├── doublet.Pool1.h5ad
│ ├── unassigned.Pool1.barcodes.tsv
│ └── unassigned.Pool1.h5ad
├── Pool1.donors.h5ad.assigned.tsv
├── Pool1__donors.h5ad.assigned.tsv
├── Pool1.donors.h5ad.tsv
├── Pool1__donors.h5ad.tsv
├── Pool1_exp__donor_n_cells.tsv
├── Pool1.h5ad.tsv
├── vireo_annot.Pool1.h5ad
└── Vireo_plots.pdf
```
- cellsnp
The cellsnp
folder contains genotypes called from single-cell data per droplet/cell
Click to view detailed folder structure
```
cellsnp
└── cellsnp_Pool1
├── cellSNP.base.vcf.gz
├── cellSNP.cells.vcf.gz
├── cellSNP.samples.tsv
├── cellSNP.tag.AD.mtx
├── cellSNP.tag.DP.mtx
└── cellSNP.tag.OTH.mtx
```
- concordances
The concordances
contains statistics describing how confident the cell-inferred genotype aligns with the reference genotypes.
Click to view detailed folder structure
```
concordances
├── all_variants_description.tsv
├── becoming_different_donor.png
├── becoming_doublet_donor.png
├── becoming_unassigned_donor.png
├── Pool1
│ ├── 1090095_1090095-donor3--each_cells_comparison_with_other_donor.tsv
│ ├── 1709635_1709635-donor5--each_cells_comparison_with_other_donor.tsv
│ ├── 2288590_2288590-donor6--each_cells_comparison_with_other_donor.tsv
│ ├── 2743244_2743244-donor7--each_cells_comparison_with_other_donor.tsv
│ ├── 2768849_2768849-donor4--each_cells_comparison_with_other_donor.tsv
│ ├── 2998395_2998395-donor2--each_cells_comparison_with_other_donor.tsv
│ ├── 3183427_3183427-donor0--each_cells_comparison_with_other_donor.tsv
│ ├── 3699286_3699286-donor1--each_cells_comparison_with_other_donor.tsv
│ ├── 4853673_4853673-donor9--each_cells_comparison_with_other_donor.tsv
│ ├── 5154993_5154993-donor8--each_cells_comparison_with_other_donor.tsv
│ ├── becoming_different_donor.png
│ ├── becoming_doublet_donor.png
│ ├── becoming_unassigned_donor.png
│ ├── cell_belongings.tsv
│ ├── cellSNP.cells.vcf.gz
│ ├── Pool1__joined_df_for_plots.tsv
│ ├── Pool1_subsampling_donor_swap_quantification.tsv
│ ├── Discordant_reads_becoming_different_donor_no0.png
│ ├── Discordant_reads_becoming_different_donor.png
│ ├── Discordant_reads_by_n_sites_becoming_different_donor_no0.png
│ ├── Discordant_reads_by_n_sites_becoming_different_donor.png
│ ├── discordant_sites_in_other_donors_noA2G.tsv
│ ├── Nr_discordant_uninformative_becoming_different_donor.png
│ ├── sites_becoming_different_donor_no0.png
│ ├── sites_becoming_different_donor.png
│ ├── sites_becoming_different_donor_probs.png
│ ├── sites_becoming_doublet_donor.png
│ ├── sites_becoming_unassigned_donor.png
│ ├── sites_vs_concordance.png
│ ├── stats_Pool1_gt_donor_assignments.csv
│ ├── sub_Pool1_Expected.vcf.gz
│ ├── sub_Pool1_GT_Matched.vcf.gz
│ ├── subplot_sites_vs_concordance.png
│ └── Total_reads_becoming_different_donor.png
├── Discordant_reads_becoming_different_donor_no0.png
├── Discordant_reads_becoming_different_donor.png
├── Discordant_reads_by_n_sites_becoming_different_donor_no0.png
├── Discordant_reads_by_n_sites_becoming_different_donor.png
├── joined_df_for_plots.tsv
├── Nr_discordant_uninformative_becoming_different_donor.png
├── sites_becoming_different_donor_no0.png
├── sites_becoming_different_donor.png
├── sites_becoming_different_donor_probs.png
├── sites_becoming_doublet_donor.png
├── sites_becoming_unassigned_donor.png
├── sites_vs_concordance.png
├── subplot_sites_vs_concordance.png
└── Total_reads_becoming_different_donor.png
```
- gtmatch
If genotypes are provided it contains the results of donors assigned by gtcheck.
Click to view detailed folder structure
```
gtmatch/
├── assignments_all_pools.tsv
└── Pool1
├── Done.tmp
├── Expected_Withing_expected_Pool1.genome
├── GT_replace_PiHAT_Stats_File_Pool1.csv
├── InferedExpected_Expected_Infered_Pool1.genome
├── InferedGTMatched_Expected_Infered_Pool1.genome
├── InferedOnly_Withing_pool_Pool1.genome
├── PiHAT_Stats_File_Pool1.csv
├── Pool1_gt_donor_assignments.csv
├── pool_Pool1_panel_Pool1_Onek1K_gtcheck_donor_assignments.csv
├── pool_Pool1_panel_Pool1_Onek1K_gtcheck_score_table.csv
└── stats_Pool1_gt_donor_assignments.csv
```
celltype_assignemt
The celltype_assignemt
folder contains the following subdirectories and files:
-
All_Celltype_Assignments.tsv
A combined file with results per barcode/droplet/cell.
-
donor_celltype_report.tsv
Summarized cell counts per donor.
-
tranche_celltype_report.tsv
Summarized cell counts per tranche.
-
scpred
This folder contains the results of scPred
-
azimuth
This folder contains the results of azimuth
-
celltypist
This folder contains the results of celltypist
Click to view detailed folder structure
```
celltype_assignemt/
├── All_Celltype_Assignments.tsv
├── azimuth
│ └── PBMC
│ ├── AZ_1.pre_QC_adata_Pool1_Pool1_celltype.l1.mapping_score_umap.pdf
│ ├── AZ_1.pre_QC_adata_Pool1_Pool1_celltype.l1.mapping_score_vln.pdf
│ ├── AZ_1.pre_QC_adata_Pool1_Pool1_celltype.l1.ncells_by_type_barplot.pdf
│ ├── AZ_1.pre_QC_adata_Pool1_Pool1_celltype.l1.prediction_score_umap.pdf
│ ├── AZ_1.pre_QC_adata_Pool1_Pool1_celltype.l1.prediction_score_vln.pdf
│ ├── AZ_1.pre_QC_adata_Pool1_Pool1_celltype.l1.query_umap.pdf
│ └── AZ_1.pre_QC_adata_Pool1_Pool1_predicted_celltype_l1.tsv
├── celltypist
│ ├── COVID19_Immune_Landscape
│ │ └── Pool1
│ │ ├── Pool1___COVID19_Immune_Landscape___decision_matrix.csv
│ │ ├── Pool1___COVID19_Immune_Landscape___predicted_labels.csv
│ │ ├── Pool1___COVID19_Immune_Landscape___probability_matrix.csv
│ │ ├── Pool1_majority_voting.pdf
│ │ ├── Pool1_over_clustering.pdf
│ │ └── Pool1_predicted_labels.pdf
│ ├── Immune_All_High
│ │ └── Pool1
│ │ ├── Pool1___Immune_All_High___decision_matrix.csv
│ │ ├── Pool1___Immune_All_High___predicted_labels.csv
│ │ ├── Pool1___Immune_All_High___probability_matrix.csv
│ │ ├── Pool1_majority_voting.pdf
│ │ ├── Pool1_over_clustering.pdf
│ │ └── Pool1_predicted_labels.pdf
│ └── Immune_All_Low
│ └── Pool1
│ ├── Pool1___Immune_All_Low___decision_matrix.csv
│ ├── Pool1___Immune_All_Low___predicted_labels.csv
│ ├── Pool1___Immune_All_Low___probability_matrix.csv
│ ├── Pool1_majority_voting.pdf
│ ├── Pool1_over_clustering.pdf
│ └── Pool1_predicted_labels.pdf
├── donor_celltype_report.tsv
├── scpred
│ ├── AZ_1.pre_QC_adata_Pool1_AZ_1.pre_QC_adata_Pool1__scpred_prediction.tsv
│ └── AZ_1.pre_QC_adata_Pool1_hier_scpred.RDS
└── tranche_celltype_report.tsv
```
clustering_and_integration
The clustering_and_integration
folder contains integrated and clustered data, along with statistics and plots that describe the performance of the integration and clustering processes.
Click to view detailed folder structure
```
clustering_and_integration/
├── normalize=total_count.vars_to_regress=none
│ ├── adatametadata.tsv.gz
│ ├── adatanormalized.h5ad
│ ├── adatanormalized_pcacounts.h5ad
│ ├── adatanormalized_pca.h5ad
│ ├── adatanormalized_pcaknee.tsv
│ ├── adatapcs.tsv.gz
│ ├── donor_level_anndata_QCfiltered
│ │ └── Pool1___sample_QCd_adata.h5ad
│ ├── plots
│ ├── reduced_dims-null-bbknn.batch=experiment_id.n_pcs=20
│ │ ├── cluster.number_neighbors=-1.method=leiden.resolution=0.1
│ │ │ ├── clustering_0.1clustered.h5ad
│ │ │ ├── clustering_0.1clustered.tsv.gz
│ │ │ ├── dotplot_sampleclustering_0.1clustered_ncells0.pdf
│ │ │ ├── dotplot_sampleclustering_0.1clustered_ncellsless5.pdf
│ │ │ ├── dotplot_sampleclustering_0.1clustered.pdf
│ │ │ ├── plots
│ │ │ ├── sccaf
│ │ │ └── validate_resolution
│ │ ├── cluster.number_neighbors=-1.method=leiden.resolution=0.5
│ │ ├── cluster.number_neighbors=-1.method=leiden.resolution=1.0
│ │ ├── cluster.number_neighbors=-1.method=leiden.resolution=5.0
│ │ ├── outfile_adatabbknn.h5ad
│ │ ├── plots
│ │ ├── reduced_dims.tsv.gz
│ │ ├── resolution_tuningmerged_model_report.tsv.gz
│ │ ├── resolution_tuningmerged_test_result.tsv.gz
│ │ └── umap_gather_out.h5ad
│ ├── reduced_dims-null-harmony.n_pcs=20.variables=experiment_id.thetas=1.0
│ │ ├── cluster.number_neighbors=15.method=leiden.resolution=0.1
│ │ ├── cluster.number_neighbors=15.method=leiden.resolution=0.5
│ │ ├── cluster.number_neighbors=15.method=leiden.resolution=1.0
│ │ ├── cluster.number_neighbors=15.method=leiden.resolution=5.0
│ │ ├── plots
│ │ ├── reduced_dims.tsv.gz
│ │ ├── resolution_tuningmerged_model_report.tsv.gz
│ │ ├── resolution_tuningmerged_test_result.tsv.gz
│ │ └── umap_gather_out.h5ad
│ └── reduced_dims-null-pca.n_pcs=20
│ ├── clustering_and_integration
│ │ └── plots
│ └── reduced_dims.tsv.gz
└── plots
```
citeseq
The citeseq
folder contains the following subdirectories:
- [DSB]
DSB has folders for each pool which contain DSB background and removed protein counts if citeseq is in the quantification matrix
Click to view detailed folder structure
```
citeseq/
└── DSB
└── Pool1
└── CITE__Pool1
```
handover
The handover
folder contains the following subdirectories:
-
Summary_plots
This folder contains various plots across the pipeline
-
Donor_Quantification
This folder contains h5ad and TSV files for each donor
-
Donor_Quantification_summary
This folder contains TSV files summarising information and statistics about donors and pools
-
merged_h5ad
This folder contains merged h5ad files from various stages of the pipeline
Click to view detailed folder structure
```
handover/
├── Donor_Quantification
├── Donor_Quantification_summary
├── merged_h5ad
└── Summary_plots
```