Score:1

Hello, I am new in bioinformatics and I am trying to do analysis of RNA sequencing with zUMIs

al flag

I wrote the .yaml file which was according the instruction from zUMIs:

###########################################
#Welcome to zUMIs
#below, please fill the mandatory inputs
#We expect full paths for all files.
###########################################

#define a project name that will be used to name output files
project: SmartSeq3

#Sequencing File Inputs:
#For each input file, make one list object & define path and barcode ranges 
#base definition vocabulary: BC(n) UMI(n) cDNA(n).
#Barcode range definition needs to account for all ranges.You can  give several comma-separated ranges for BC & UMI sequences, eg.BC(1-6,20-26)
#you can specify between 1 and 4 input files

sequence_files:
  file1:
    name: /home/isidora/IRCCS_Candiolo/Fastq/*_R1_001_val_1.fq ##/home/isidora/IRCCS_Candiolo/Fastq/zUMIsFiles1/ #path to first file
    base_definition:
      - cDNA(1-150)
      - BC(1-8)
      - UMI(12-19)
  file2:
    name: /home/isidora/IRCCS_Candiolo/Fastq/*_R2_001_val_2.fq ## /home/isidora/IRCCS_Candiolo/Fastq/zUMIsFiles2/ #path to second file
    base_definition:
      - cDNA(1-150)
      - BC(1-8)
      - UMI(12-19)

#reference genome setup
reference:
  STAR_index: /home/isidora/IRCCS_Candiolo/Fastq/ #path to STAR genome index
  GTF_file: /home/isidora/IRCCS_Candiolo/Fastq/hg38.refGene.gtf #path to gene annotation file in GTF format
  exon_extension: no #extend exons by a certain width?
  extension_length: 0 #number of bp to extend exons by
  scaffold_length_min: 0 #minimal scaffold/chromosome length to consider (0 = all)
  additional_files: /home/isidora/IRCCS_Candiolo/Fastq/hg38.fa #Optional parameter. It is possible to give additonal reference sequences here, eg ERCC.fa
  additional_STAR_params: pGe.sjdbOverhang > 0 #Optional parameter. you may add custom mapping parameters to STAR here
  #output directory
out_dir: /home/isidora/IRCCS_Candiolo/Fastq/zUMIs/ #specify the full path to the output directory

###########################################
#below, you may optionally change default parameters
###########################################

#number of processors to use
num_threads: 20
mem_limit: null #Memory limit in Gigabytes, null meaning unlimited RAM usage
#barcode & UMI filtering options
#number of bases under the base quality cutoff that should be filtered out.
#Phred score base-cutoff for quality control.
filter_cutoffs:
  BC_filter:
    num_bases: 3
    phred: 20
  UMI_filter:
    num_bases: 2
    phred: 20

#Options for Barcode handling
#You can give either number of top barcodes to use or give an annotation of cell barcodes
#If you leave both barcode_num and barcode_file empty, zUMIs will perform automatic cell barcode selection for you!
barcodes:
  barcode_num: null
  barcode_file: /home/isidora/IRCCS_Candiolo/Fastq/SampleSheet.txt
  barcode_sharing: null #Optional for combining several barcode sequences per cell (see github wiki)  
  automatic: yes #Give yes/no to this option. If the cell barcodes should be detected automatically. If the barcode file is given in combination with automatic barcode detection, the list of giiven barcodes will be used as whitelist.
  BarcodeBinning: 1 #Hamming distance binning of close cell barcode sequences.  
  nReadsperCell: 100 #Keep only the cell barcodes with atleast n number of reads.  
  demultiplex: yes #produce per-cell demultiplexed bam files.

#Options related to counting of reads towards expression profiles
counting_opts:
  introns: yes #can be set to no for exon-only counting.
  intronProb: no #perform an estimation of how likely intronic reads are to be derived from mRNA by comparing to intergenic counts.
  downsampling: '0' #Number of reads to downsample to. This value can be a fixed number of reads (e.g. 10000) or a desired range (e.g  10000-20000) Barcodes with less than <d> will not be reported. 0 means adaptive downsampling. Default: 0
  strand: 0 #Is the library stranded? 0 = unstranded, 1 = positively stranded, 2 = negatively stranded 
  Ham_Dist: 1 #Hamming distance collapsing of UMI sequences.
  velocyto: no #Would you like velocyto to do counting of intron-exon spanning reads  
  primaryHit: yes #Do you want to count the primary Hits of multimapping reads towards gene expression levels?
  multi_overlap: no #Do you want to assign reads overlapping to multiple features?  
  fraction_overlap: 0 #minimum required fraction of the read overlapping with the gene for read assignment to genes  
  twoPass: yes #perform basic STAR twoPass mapping

#produce stats files and plots?
make_stats: yes


#Start zUMIs from stage. Possible TEXT(Filtering, Mapping, Counting,Summarising). Default: Filtering.
which_Stage: Filtering

#define dependencies program paths
samtools_exec: samtools #samtools executable
Rscript_exec: Rscript #Rscript executable
STAR_exec: STAR #STAR executable
pigz_exec: pigz #pigz executable

#below, fqfilter will add a read_layout flag defining SE or PE

When I start the program with this code zUMIs/zUMIs.sh -c -y ~/IRCCS_Candiolo/Fastq/zUMIs/zUMIs.yaml, I get errors:

Using miniconda environment for zUMIs!
 note: internal executables will be used instead of those specified in the YAML file!


 You provided these parameters:
 YAML file:     /home/isidora/IRCCS_Candiolo/Fastq/zUMIs/zUMIs.yaml
 zUMIs directory:               /home/isidora/IRCCS_Candiolo/Fastq/zUMIs
 STAR executable                STAR
 samtools executable            samtools
 pigz executable                pigz
 Rscript executable             Rscript
 RAM limit:   null
 zUMIs version 2.9.7e


Wed Aug 23 12:06:25 CEST 2023
WARNING: The STAR version used for mapping is 2.7.3a and the STAR index was created using the version 2.7.4a. This may lead to an error while mapping. If you encounter any errors at the mapping stage, please make sure to create the STAR index using STAR 2.7.3a.
Filtering...
sh: 1: Syntax error: Unterminated quoted string
sh: 1: Syntax error: Unterminated quoted string
Wed Aug 23 12:06:26 CEST 2023
Error in eval(bysub, parent.frame(), parent.frame()) :
  object 'XC' not found
Calls: cellBC -> [ -> [.data.table -> eval -> eval
In addition: Warning message:
In data.table::fread(bccount_file, header = FALSE, col.names = c("XC",  :
  File '/home/isidora/IRCCS_Candiolo/Fastq/zUMIs//SmartSeq3.BCstats.txt' has size 0. Returning a NULL data.table.
Execution halted
Mapping...
[1] "2023-08-23 12:06:28 CEST"
Warning message:
In data.table::fread(cmd = paste(samtools, "view", filtered_bams[1],  :
  File '/tmp/RtmpKSSqU2/file57f5393a8fde' has size 0. Returning a NULL data.table.

EXITING because of fatal PARAMETERS error: pGe.sjdbOverhang <=0 while junctions are inserted on the fly with --sjdbFileChrStartEnd or/and --sjdbGTFfile
SOLUTION: specify pGe.sjdbOverhang>0, ideally readmateLength-1
Aug 23 12:07:18 ...... FATAL ERROR, exiting
Wed Aug 23 12:07:18 CEST 2023
Counting...
[1] "2023-08-23 12:07:30 CEST"
Error in fread(paste0(opt$out_dir, "/zUMIs_output/", opt$project, "kept_barcodes_binned.txt")) :
  File '/home/isidora/IRCCS_Candiolo/Fastq/zUMIs//zUMIs_output/SmartSeq3kept_barcodes_binned.txt' does not exist or is non-readable. getwd()=='/home/isidora/IRCCS_Candiolo/Fastq/zUMIs'
Execution halted
Wed Aug 23 12:07:30 CEST 2023
Loading required package: yaml
Loading required package: Matrix
[1] "loomR found"
Error in gzfile(file, "rb") : cannot open the connection
Calls: rds_to_loom -> readRDS -> gzfile
In addition: Warning message:
In gzfile(file, "rb") :
  cannot open compressed file '/home/isidora/IRCCS_Candiolo/Fastq/zUMIs//zUMIs_output/expression/SmartSeq3.dgecounts.rds', probable reason 'No such file or directory'
Execution halted
Wed Aug 23 12:07:32 CEST 2023
Descriptive statistics...
[1] "I am loading useful packages for plotting..."
[1] "2023-08-23 12:07:33 CEST"
Error in data.table::fread(paste0(opt$out_dir, "/zUMIs_output/", opt$project,  :
  File '/home/isidora/IRCCS_Candiolo/Fastq/zUMIs//zUMIs_output/SmartSeq3kept_barcodes.txt' does not exist or is non-readable. getwd()=='/home/isidora/IRCCS_Candiolo/Fastq/zUMIs'
Execution halted

I am new at bioinformatic field and I do not understand the meaning of these errors and the things I should correct. Can anyone help me please?

terdon avatar
cn flag
I don't see any obvious missing `'` or `"` (that's what "Unterminated quoted string" means) in the [zUmis.sh](https://github.com/sdparekh/zUMIs/blob/main/zUMIs.sh) version on github, but maybe yours is different? Can you changing `downsampling: '0'` to `downsampling: 0` in your yaml file and relaunching? I doubt this is the issue, but it's the only quote I see there and maybe the script isn't picking it up correctly (you normally don't need to quote integers, and this is the only quoted one).
terdon avatar
cn flag
Also, should `STAR_index:` be pointing to a file instead of a directory? And did you see the warning? `WARNING: The STAR version used for mapping is 2.7.3a and the STAR index was created using the version 2.7.4a. This may lead to an error while mapping. If you encounter any errors at the mapping stage, please make sure to create the STAR index using STAR 2.7.3a.` Have you tried that?
ru flag
Ask Ubuntu Mod Note: It has been noted in custom flags that this is a better fit for bioinformatics.stackexchange.com and they have no issue with this being migrated over there. Therefore, I am migrating this to that site at their request.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.