Import Data Into Projects

Import Files into a Project

The file uploader imports the following file types to any project you have write access to: FASTQ (.fastq.gz), analysis (VCF and gVCF), manifest (.txt), or other file types. Use the file uploader when you want to analyze files generated outside of BaseSpace Sequence Hub, or to attach other information related to the project.

  1. Open the project.

  2. From the project, select File, Upload, and then select Files.

  3. Select type of files to upload.

  4. If you are uploading a FASTQ file, do as follows.

    1. To upload FASTQs to a sample,

      1. Set the "Save Upload To" toggle to "Sample".

      2. Select Finish Upload.

    2. To upload FASTQs to a biosample,

      1. Select Select Biosample, then select an existing or create a new biosample to associate the FASTQ dataset with.

      2. Enter a library name.

      3. Select a prep kit.

      4. Select Finish Upload.

  5. If you are uploading a VCF file, do as follows.

    1. [Optional] Select Select Biosample, then select or create a biosample that the VCF will be associated with.

    2. Select Finish Upload.

  6. If you are uploading a manifest file, do as follows.

    1. [Optional] Select Select Biosample, then select or create a biosample that the manifest will be associated with.

    2. Select Finish Upload.

  7. If you are uploading other file types, do as follows.

    1. [Optional] Select Select Biosample, then select or create a biosample that the files will be associated with.

    2. Select Finish Upload.

Uploading multiple FASTQ, VCF, or manifest files in a single session requires files of the same type.

The FastQ importer only works for complete samples, you can not upload read2 of a FASTQ alone.

FASTQ files need to adhere to Illumina standards, as specified below:

  • Data for a single sample can constitute multiple files. The total number of files per sample and their combined size are limited to 16 and 25 GB respectively.

  • The uploader will only support gzipped FASTQ files generated on Illumina instruments.

  • The name of the FASTQ files must conform the following convention: SampleName_SampleNumber_Lane_Read_FlowCellIndex.fastq.gz (i.e. SampleName_S1_L001_R1_001.fastq.gz / SampleName_S1_L001_R2_001.fastq.gz)

  • The read descriptor in the FASTQ files must conform to the following convention: @Instrument:RunID:FlowCellID:Lane:Tile:X:Y ReadNum:FilterFlag:0:SampleNumber:

  • Read 1 descriptor would look like this: @M00900:62:000000000-A2CYG:1:1101:18016:2491 1:N:0:13

  • Read 2 would have a 2 in the ReadNum field, like this: @M00900:62:000000000-A2CYG:1:1101:18016:2491 2:N:0:13

Quality considerations

  • The number of base calls for each read must equal the number of quality scores

  • The number of entries for Read 1 must equal the number of entries for Read 2

  • The uploader will determine if files are paired-end based on the matching file names in which the only difference is the ReadNum

  • For paired-end reads, the descriptor must match for every entry for both reads 1 and 2

  • Each read has passed filter

Last updated