Biosamples are the new focus of BaseSpace Sequence Hub. Physically, they are the original DNA samples that needs to be prepared, sequenced, and analyzed to produce the desired results for a bioinformatician. In BaseSpace Sequence Hub, they are the central link for related physical entities and digital data such as libraries, pools, runs, lanes, analyses, and data sets.
Add biosamples in a biosammple workflow file. You can download a template and upload completed files from the Biosamples page, available from the My Data tab. In the biosample workflow *.csv file, add information about the new biosamples, the projects you want to store data in, the library preps you want to use, the yields required to launch an app, and the analysis workflows you want to schedule. BaseSpace Sequence Hub validates the inputs and adds the biosamples to the system. For more information, see Add Biosamples.
Samples and their FASTQ files have been converted into biosamples and one or more FASTQ data sets. This change supports automation of data aggregation for auto-app launch involving biosamples. Many samples have retained their original name; you can find them on the biosample list or by using the search function. For more information, see Using Your Data in the New Data Model.
Libraries and pools are not uploaded manually but are generated automatically using the information in a sample sheet with an instrument run upload. From the sample sheet, Sample ID is used for the biosample name, and Sample Name is used for the library name.
When a biosample name is recognized due to a previous biosample workflow upload or instrument run containing the biosample, BaseSpace Sequence Hub checks the sample sheet for a match to a library. If we find an exact match, we add sequencing data to the existing biosample and library. If we don't find exact matches, we create a new biosample, a new library, or both.
When more than one library is given for a single lane number in the sample sheet, we interpret this as a pooled sample merged together using the libraries. We automatically assign a name to the pool and link it to the biosample and libraries. If the same combination of libraries exists within the same instrument run, the generated data are linked to the same pool. Library combinations cannot be reliably matched to runs across different instruments; in those cases, new pools are created.
The libraries and pools automatically generated from a sample sheet can be found in the Libraries tab of the biosample details page.
Library prep kits are the names of the sample preparation kits used to turn biosamples into sample libraries. They are defined in the Prep Request column in the biosample workflow upload. BaseSpace Sequence Hub uses this information to separate data during data aggregation when there are two or more library prep kits used for the same biosample. For example, if you use a TruSeq PCR-Free library prep kit to prepare your libraries but receive poor results due to a low starting concentration of DNA, then make a second attempt with a TruSeq Nano library prep kit to amplify the DNA, you can use Sequence Hub to separate the data produced by the kit used to prepare the data.
Yield for a biosample is separated per unique library prep kit. View the list of prep kits for a biosample on the Summary tab of the biosample details page.
To launch an app using only the data from specific kits, select Select Biosample button when selecting the app inputs. This options enables a biosample chooser where you can select data by library prep kit.
Biosample metadata are key-value pairs used to save custom information to biosamples. The metadata can be viewed from the biosample summary page. Biosample metadata can only be entered when first creating the biosamples through the biosample workflow spreadsheet upload. When you add custom columns to the spreadsheet and define the values for the biosamples, the biosamples are imported with the metadata.
Biosamples do not belong in a project. Instead, biosamples are related to a project by producing sequencing data in the form of data sets, which do belong in projects. Biosamples are required to have a default project, which is the default location data is written to when it is produced through Generate FASTQ and other BaseSpace Sequence Hub apps.
Biosamples can be related to many projects by creating data sets in each of them. For example, a biosample may be assigned a default project named Project A, where its FASTQ data sets are saved to. You can select the biosample as an input to manually launch an app and specify a different project, Project B, as the output project. The app then creates general data sets in Project B. The biosample is now linked to both projects, but does not belong to either of them.
When you upload FASTQ files, you create a new FASTQ data set which must be linked to a new biosample and library. Our new data model uses automatic aggregation of data to exclude any failures or low quality data among the biosamples, libraries, pools, lanes, and data sets. To allow auto-app launch to work, manual uploading of FASTQ files must conform to this model. The modified file import page will allow the creation of new biosamples and libraries to support adding FASTQ files to Sequence Hub.
Support for deleting biosamples is pending.
Canceling a biosample affects further work initiated to be performed on biosample data. Analyses that have not already been completed or stopped are canceled and their delivery status is changed to Do Not Deliver. These biosamples no longer apprear in the available list of biosamples to be selected for app launch. Lab requeues can no longer be created for these biosamples and new biosamples cannot be created with the same name.
Yield is a measure of how much sequencing data has been produced, in units of base pairs. Yield is the most commonly used app dependency to automatically launch an analysis for a biosample. BaseSpace Sequence Hub determines how much yield was produced from each flow cell lane the biosample was sequenced on, even if the biosample was merged into a pool with other biosamples.
Lane QC thresholds are a user setting that applies to the metrics of lanes from all runs the user owns, once the run is complete. You can set the thresholds using the API. For more information, see the developer documentation at developer.basespace.illumina.com.
The Generate FASTQ app runs immediately after a run completes to convert .bcl files to .fastq files and demultiplex any indexing that occurred. If the app fails to finish, the status changes to Aborted, which causes the sequencing run status to change to Failed.
You can use the Fix Sample Sheet and Requeue option in the Run Details page to restart the Generate FASTQ analysis. This initiates a new Generate FASTQ analysis and resets the sequencing run status to Analyzing.
Analysis workflows are templates that contain pre-defined settings and QC thresholds for a specific app. These workflows can be scheduled in advance to automatically launch when minimum requirements, called dependencies, are met.
Use the biosample workflow upload to schedule analyses for either new or existing biosamples. The analyses remain in Pending status until they can be launched. For more information, see Analysis Workflows.
You can schedule apps to automatically launch or you can manually launch them.
To automatically launch an app, schedule an analysis workflow in a biosample workflow file. When enough yield or other dependencies are met for the analysis workflow, the analysis uses the biosample as an input to launch automatically.
Manually launch apps through the app details page. Apps that formerly required samples now require biosamples. Select inputs from a list of biosamples that contain FASTQ data sets.
The new data model supports data aggregation for the same biosample placed in multiple flow cell lanes in multiple runs. BaseSpace Sequence Hub now automatically locates and merges different samples for you before launching an app. When a biosample is linked with multiple libraries of a similar type, placed on different lanes, and placed on different flowcell runs, we can collect all of the FASTQ files produced exclusively for the original sample and input them into the app.
BaseSpace Sequence Hub excludes data that do not meet quality thresholds, which improves the chances of success in running apps. Immediately before an app is launched with biosamples as the input, BaseSpace Sequence Hub checks the statuses of all resources that produced the FASTQ data sets, including libraries, pools, data sets, runs, and lanes. For example, if a sequencing lane had failed due to quality, the app will not include any FASTQ data sets produced from that specific lane.
You can manually override QC statuses. For example, you can set a pool to Failed, which automatically excludes all FASTQ data sets produced by the pool.
The biosample workflow upload allows you to specify an existing biosample in the analysis workflow column of the spreadsheet. As long as the biosample name given is an exact match with a biosample already owned by the uploading user, the analysis workflow is added to the existing biosample.
The delivery status of an analysis is a manually updated, independent status used for tracking the progress of sending data to another user. You can use this to mark the data to be delivered and track the status of review and delivery . For more information, see Share Analyses.
Lab requeues are a way to request more yield when a biosample falls short of what is required to run an app successfully. When a biosample has not produced the required yield in the specified time, it is marked as Missing Yield. You can initiate a lab requeue to request the sequencing lab to produce more data to make up for the missing amount.
When you initiate a lab requeue, you specify the checkpoint in the sample prep steps the lab should begin from. You can initiate more than one requeue at the same time, but Illumina recommends that only one lab requeue be fulfilled at a time.
When yield shows up in the form of another sequencing run, the lab status transitions to Sequencing. If enough shows up to meet the requested amount, the lab requeue status updated to Fulfilled. For more information, see Request a Lab Requeue.
Data sets are bundles of one or more files output by BaseSpace Sequence Hub apps. They can be used as input to other BaseSpace Sequence Hub apps when chaining apps together. Data sets belong in projects and are included if the project they are in is shared or transferred.
Data sets can be found in the biosample details page under the output files tab.