Data Model

Each sequencing run produces log files, instrument health data, run metrics, base call information (*.bcl files), and other data. BaseSpace Sequence Hub demultiplexes base call information to create the FASTQ files used in secondary analysis.

Common Terms

  • Biosamples represent the source biological sample being sequenced. They are associated with data aggregated from multiple sequencing runs according the sample name provided in the samplesheet of each run.

  • Samples represent a set of FASTQ files from a single sequencing run, according to the sample name provided in the samplesheet.

  • Libraries are produced when a biosample is prepped with a library prep kit.

  • Pools are an aliquot of one or more libraries, pooled together in order to be placed in a flowcell lane.

  • Datasets are sets of files produced by a Basespace application. Some views will refer to them as "Other Datasets" to distinguish them from Datasets containing FASTQ files. These were formerly referred to as "App Results."

  • FASTQ Datasets are a set of FASTQ files produced by FASTQ-Generation or BclConvert apps. Given their proment place in the Basespace data model, they're often treated as a distinct type from "Other Datasets," which aids in data management tasks like filtering & sorting.

  • Projects are the containers for datasets and dataset files, which can include FASTQ, BAM, and VCF files. Projects can be associated with runs, analyses, and other entities in BaseSpace Sequence Hub. If a given project contains FASTQ files, it will also be associated with one or more Samples & Biosamples.

Analyzing FASTQ Data

  • Basespace apps that analyze FASTQ files can accept either Biosamples or Samples in the input form, and the system will utilize the proper set of FASTQ files in each situation.

  • Basespace users can select their preferred input at the top of the form, and Basespace will load the correct controls into the form:

More on Biosamples

Runs and Projects are compatible with both Biosamples and Samples to offer maximum flexibility to all types of users

  • A run's Biosamples tab will list all of the Biosamples that this run contributed yield to:

  • A project's Biosamples tab will list all of the Biosamples associated with FASTQ Datasets that live this this project:

Because BaseSpace Sequence Hub tracks data for the biosample, you can easily aggregate data from a biosample that has been sequenced as part of multiple libraries or pools.

What Happened to the Classic Data Model?

Basespace still has full support for classic data types like Samples (see term definitions above for more info). You can continue to use Samples and the associated set of features if that model is a good fit for your lab's needs, like launching an app with input FASTQ files from a single run.

  • A run's Samples tab will continue to list the samples containing FASTQ files produced from that run's sequencing data:

  • A project's Samples tab will continue to list the samples associated with FASTQ files that live in that project:

Last updated