Introduction to Basemount

Introduction To BaseMount

Table of Contents


The main mechanism to interact with your BaseSpace Sequence Hub (BSSH) data is via the website at However, for some use-cases, it can be useful to work with the same data using the Linux command line interface (CLI). This allows direct ad-hoc programmatic access so that users can write ad-hoc scripts and use tools like find, xargs and command line loops to work with their data in bulk.
This is the concept behind our BaseSpace Sequence Hub command line tool, BaseMount, a FUSE driver that allows command line access to your BaseSpace Sequence Hub data.

What is BaseMount?

BaseMount is a tool to mount your BaseSpace Sequence Hub data as a Linux file system. You can navigate through projects, samples, runs and app results and interact directly with the associated files exactly as you would with any other local file system. BaseMount is based on FUSE and uses the BaseSpace Sequence Hub API to populate the contents of each directory.

Tutorial Videos

This video series starts by preparing you for your first mount and ends with filtering samples from the command line based on their metadata.
The script of each video is included here for quick reference.

Step 1 - Install

sudo bash -c "$(curl -L"
# As your intended user: check that fusermount is executable
fusermount --version
# Optional: Refresh bash auto-completion
exec bash

Step 2 - Import Data to Account

In a browser, starting from
- Log in
- Add public dataset (2 runs + 1 project):
"HiSeq X Ten TruSeq PCR Free (16 NA12878 1 plex)"
(Use the following import links to avoid going through the web GUI)
- Run 1 :
- Run 2 :
- Project:

Step 3 - First Time Launch

# Mount your BaseSpace account
mkdir BaseSpace
basemount BaseSpace/
<copy authentication URL to browser>
<login in browser>
<accept authentication>
# See the top level of your newly mounted environment!
ls BaseSpace

Step 4 - Projects, Runs, Samples

# Mount your BaseSpace account
mkdir BaseSpace
basemount BaseSpace/
# List your projects and runs
cd BaseSpace
ls Projects
ls -lh Runs
# List files from one particular sample
cd "Projects/HiSeq X Ten: TruSeq PCR-Free (16 replicates of NA12878)"
cd Samples
cd NA12878_L1_S1
ls Files/
# Extract first 2 lines of compressed fastq without having to download whole file
zcat Files/NA12878-L1-S1_S1_L001_R1_001.fastq.gz | head -2

Step 5 - AppSessions and AppResults

# Mount your BaseSpace account
mkdir BaseSpace
basemount BaseSpace/
# List appsessions and appresults
cd "BaseSpace/Projects/HiSeq X Ten: TruSeq PCR-Free (16 replicates of NA12878)"
ls AppSessions
ls AppResults
# List files from one particular appresult
cd "AppSessions/BWA Whole Genome Sequencing v1.0 - NA12878_L1_S1"
cd AppResults.20208193.NA12878_L1_S1
cd Files
ls -lh

Step 6 - First Time Launch

# Mount your BaseSpace account
mkdir BaseSpace
basemount BaseSpace/
# Go to the Files directory of a sample
cd "BaseSpace/Projects/HiSeq X Ten: TruSeq PCR-Free (16 replicates of NA12878)"
cd AppResults/NA12878_L1_S1/Files
ls -lh
# Run interactive samtools queries without having to download the full file
samtools view -H NA12878-L1-S1_S1.bam
samtools view NA12878-L1-S1_S1.bam chr3:456789-456789 | head -2
samtools view NA12878-L1-S1_S1.bam chr3:456789-456789 | wc -l
# Use IGV without having to download the full BAM file
<open NA12878-L1-S1_S1.bam in IGV>
<enter chr3:456789 in IGV>
<admire the view>

Step 7 - Metadata

# Mount your BaseSpace account
mkdir BaseSpace
basemount BaseSpace/
# Enter a run and look at its metadata
cd BaseSpace
cd "Runs/HiSeq X Ten TruSeq PCR Free (16 NA12878 1 plex) FC_A"
ls -al
cat .type
cat .id
cat .json
# Extract specific metadata fields (you may need `apt-get/yum install jq`)
cat .json | jq .Response.SequencingStats
cd ..
# Get list of runs with yields above a specific threshold
cat .json | jq '.Response.Items[] | select( .SequencingStats.YieldTotal > 100 )
| { Name: .ExperimentName, Yield: .SequencingStats.YieldTotal }'
cat .json | jq -r '.Response.Items[] | select(.SequencingStats.YieldTotal>100)
| "\(.ExperimentName)\t\(.SequencingStats.YieldTotal)"'

Step 8 - File Upload

# Mount your BaseSpace account
mkdir BaseSpace
basemount BaseSpace
# Create new project
cd BaseSpace/Projects
mkdir myNewProject
# Create new AppResult (the only entity able to store files other than fastq)
cd myNewProject/AppResults
mkdir myNewAppResult
cd myNewAppResult
# Create a file there (or copy one with `cp`)
echo "Hello BaseSpace" > Files/hello.txt
# Mark the container as "Complete"
basemount-cmd mark-as-complete
# (You can now navigate to your BaseSpace account in a browser
# and check that the file is present in the myNewProject project)

Step 9 - Move to trash and restore from trash

# Mount your BaseSpace account
mkdir BaseSpace
basemount BaseSpace
# Create new project
cd BaseSpace/Projects
mkdir myNewProject
# Move project to the trash
basemount-cmd --path myNewProject move-to-trash
# View project in the trash
ls -ld ../.Trash/myNewProject
# Restore project from the trash
basemount-cmd --path ../.Trash/myNewProject restore-from-trash
ls -ld myNewProject

BaseMount Installation

Installation requires root access. Non-root CentOS users need to be added to the 'fuse' user group by an administrator before being able to run BaseMount.

Quick install

Run the following command:
sudo bash -c "$(curl -L"
This script works on both Ubuntu and CentOS. It adds BaseSpace Sequence Hub package repositories, import public keys to your system and installs BaseMount and its prerequisites.

Manual install

# Add BaseMount repository
echo "# Packages by Illumina BaseSpace" | sudo tee ${FILE}
echo "deb saucy main" | sudo tee -a ${FILE}
# Import BaseSpace Sequence Hub GPG key
curl | sudo apt-key add -
# Refresh db and install BaseMount
sudo apt-get update
sudo apt-get install basemount
# Optional: Refresh bash auto-completion
exec bash
# Add BaseMount repository
echo "# Packages by Illumina BaseSpace" | sudo tee ${FILE}
echo "[bintraybintray-basespace-BaseMount-RPM]" | sudo tee -a ${FILE}
echo "name=bintray-basespace-BaseMount-RPM" | sudo tee -a ${FILE}
echo "baseurl=" | sudo tee -a ${FILE}
echo "gpgcheck=1" | sudo tee -a ${FILE}
echo "enabled=1" | sudo tee -a ${FILE}
# Import BaseSpace Sequence Hub GPG key
sudo rpm --import
# Install BaseMount
sudo yum install basemount
# Only if the command above fails because of missing bash-completion package, run:
# sudo yum install epel-release
# Check that your intended users have permissions to run fusermount (see FUSE doc)
fusermount --version
# Optional: Refresh bash auto-completion
exec bash


BaseMount can be uninstalled with the following command:
sudo bash -c "$(curl -L"

Minimum Hardware Requirements and System-Level Settings

We have tested the current version of BaseMount Alpha under the following conditions:
Supported Operating Systems:
  • Ubuntu 12, 14 and 15
  • CentOS 6.5 and 7
Minimum Hardware requirements:
  • RAM: 4GB (check your ulimit -v and -m)
  • Disk: 5GB /tmp
Recommended ulimit thresholds:
  • Open files: tested with ulimit -n 1024, should be greater or equal to this value
  • Stack size: tested with ulimit -s 8192, should be lower or equal to this value
  • Max user processes: tested with ulimit -u 16384, should be greater or equal to this value
  • Virtual memory size: tested with ulimit -v unlimited, should be greater or equal to 4GB
  • Max memory size: tested with ulimit -m unlimited, should be greater or equal to 4GB
  • Max locked memory: tested with ulimit -l 64, should be greater or equal to this value

Mounting Your BaseSpace Sequence Hub Account

After a successful installation, you can mount your BaseSpace Sequence Hub account with BaseMount. Basic usage is as follows:
basemount [--config <config name>] <mount-point folder>
where the --config parameter is optional, and useful when a user has multiple BaseSpace Sequence Hub accounts to be mounted. The config name is used to create a <config name>.cfg file in the ~/.basespace/ directory. The mount point directory becomes the top level folder in your mounted file tree.
mkdir ~/BaseSpace_Mount
basemount --config user1 ~/BaseSpace_Mount

Special configuration needs

If you are behind a proxy, as BaseMount is sending all HTTPS requests to libcurl, you may need to configure it using the https_proxy environment variable (if this doesn't work, please refer to libcurl's manual).
To run BaseMount inside a docker container, the container must run in privileged mode.
Note: If you need a more secure alternative to docker run --privileged, you may look around docker run --cap-add SYS_ADMIN --device /dev/fuse --security-opt apparmor:unconfined (but please don't ask us for any help with this obscure thing).


The first time you run BaseMount, you will be directed to a web URL and asked to enter your BaseSpace Sequence Hub user credentials. BaseMount will use these credentials to authenticate your interactions with BaseSpace Sequence Hub. By default, the credentials are cached in your home directory and they can be password-encrypted for security, just like an ssh key.

Authenticating for Workgroups

Workgroups are considered to be distinct accounts inside BaseSpace Sequence Hub. Therefore, if your browser is set to use a specific workgroup when you open the CLI authentication link, the generated access token will give you access to this workgroup data in the CLI.
You cannot "switch workgroups" inside a BaseMount mount, but you can create multiple "BaseMount configs" that will point to different workgroups or to your "Personal" space (see basemount --config <config name>), and you can mount all of them simultaneously on different mount points.

Authenticating for Entreprise domains, other servers

basemount [--config <config name>] --api-server <API URL or alias | help> <mount-point>
Similarly to Workgroups, accessing Entreprise domains, or other basespace servers will lead you to use BaseMount's --config feature as you will usually deal with multiple accounts.
You may also need to use an additional command-line parameter: --api-server <API URL or alias>.
  • --api-server= US server (Default)
  • --api-server= EU server
Please contact illumina support if you need assistance in identifying your API server URL. You can also run basemount --api-server help to get a list of known servers and their aliases.
Note that for Entreprise domains, the API URL doesn't actually include the subdomain name. However, it is important that your browser is logged in to your subdomain before authenticating BaseMount. For example if you are a European customer with an Entreprise subdomain, your API URL will likely be (no subdomain specified), but your browser must be showing the URL If you type this URL while being logged in under the base BaseSpace public domain, your browser will mistakenly redirect you outside your entreprise domain. You need to explicitely log out and re-enter the correct subdomain URL before logging in.
Once authenticated, BaseMount will store the API server URL for all subsequent mounts using this config.


Stopping BaseMount can be done in one of the following ways:
  • basemount --unmount <mount point> (this actually just calls the fusermount line below)
  • basemount <mount point or subdirectory> (same as above, after detecting that it refers to a mounted directory)
  • fusermount -u <mount point>
These unmount commands only succeed when all processes have stopped using the mounted filesystem. You also need to cd out of the filesystem's subdirectories in all your terminals.
In some circumstances, you may want to force it
  • echo unmount > <mount point>/.commands (brutal method, which kills BaseMount from within)
  • fusermount -uz <mount point>: lazy unmount, which leaves the basemount process running but frees the mount point
Note: You can list the currently mounted directories using mount | grep basemount.

Directory Structure

BaseMount uses the natural structure of the BaseSpace Sequence Hub API to create an appropriate directory hierarchy, summarized below.


BaseMount mimics the structure of the major resources represented in the BaseSpace Sequence Hub API. The root of this structure starts with the Href link that you would find in the API call


A mount point's root directory contains the following entries:
  • Runs : sub-directory containing the runs to which the user has access
  • Projects : sub-directory containing the projects
  • BioSamples: (v2 only)
  • .Trash : sub-directory containing the user's trash
  • .basemount: special directory, which shows and controls the configuration of the current BaseMount instance. This directory is accessible from anywhere in the BaseMount filesystem (even though we make it visible by ls only at the top level).
  • .commands : special file used by basemount-cmd
  • .ResourceById: (Experimental) special directory to access resources by id without the need to find their full directory hierarchy

Entity directories

These directories are:
  • {mount-point}/Projects
  • {mount-point}/Projects/{project-name}/AppResults
  • {mount-point}/Projects/{project-name}/AppSessions
  • {mount-point}/Projects/{project-name}/Samples
  • {mount-point}/Projects/{project-name}/Datasets (v2 only)
  • {mount-point}/Runs
  • {mount-point}/BioSamples (v2 only)
  • {mount-point}/BioSamples/{biosample-name}/Libraries (v2 only)
  • {mount-point}/BioSamples/{biosample-name}/Libraries/{library-name}/LibraryPrep (v2 only)
  • {mount-point}/BioSamples/{biosample-name}/PrepRequests (v2 only)
  • {mount-point}/BioSamples/{biosample-name}/RunLaneSummaries (v2 only)
  • {mount-point}/BioSamples/{biosample-name}/LabRequeues (v2 only)
  • {mount-point}/BioSamples/{biosample-name}/Datasets (v2 only)
Inside these directories, each BaseSpace Sequence Hub entity is represented as a sub-directory with the corresponding name.
Entities can be accessed by name or by ID via the symbolic links .id.{entity-id} -> {entity-name}.
With the appropriate access token and permissions, you should be able to create most entities with the mkdir command and delete them with rmdir.
Entities of type Project
Projects are the top-level container for most data, and they are used for sharing/transferring data.
Entities of type Project contain:
  • AppSessions, a.k.a. "Analyses" on the BaseSpace Sequence Hub website: list of all appsessions started in or written to this project.
  • AppResults: app outputs (with output files) associated with the project. This is a flattened list, not broken down by analysis.
  • Samples: flattened list of Samples that are in the project and will have a Files directory containing your fastq.gz files.
  • Datasets: (v2 only) When using the V2 API, AppResults and Samples are converted to Datasets.
Project Detail
Entities of type AppSession
AppSessions, also known as Analyses, are created when users launch Apps. They record the metadata of which App was launched, who launched it, the date/time of the launch, the input settings to the App, and the output AppResults (when using v1 API) or Datasets (when using v2 API).
Entities of type AppSession contain:
  • one sub-directory per appresult or sample generated during the appsession
  • one sub-directory per child appsession for multi-node appsessions
  • when generated by the app: a Logs subdirectory containing log files
Experimental: Two hidden files can be used to relaunch the app:
  • LaunchPayload: The payload that was used to launch the current appsession
  • Application/.AppLauncher: A symlink to the Application that was used to launch the app, and the .AppLauncher special file which, when written to, initiates a new app launch. Furthermore, a subsequent read of this file (cat Application/.AppLauncher) will show you the launch API response.
These two files can be made visible with basemount-cmd show-launch-files. A shortcut to relaunch the app is basemount-cmd relaunch.
Entities of type AppResult
These contain:
  • Files: app output files
  • Files.metadata (described below)
These directories are described in more details in the "Download" and "Upload" sections below.
If the entity was created with mkdir, it would first appear in write mode, making it possible to copy files to the Files directory. Once this is done, you can mark the entity as Complete with the command basemount-cmd mark-as-complete.
Entities of type Sample
These contain:
  • Files: that's where you'll find your FASTQ files
  • Files.metadata (described below)
  • SampleProperties: Information extracted from .json
These directories are described in more details in the "Download" section. (BaseMount doesn't support sample creation for the moment, as they need to go through a validation stage).
Entities of type Dataset (v2 only)
Datasets are the containers for Files in BSSH. All files are stored in Datasets except Run files and Analysis log files. Each Dataset is stored in one Project.
These contain:
  • Files: as the name suggests...
  • Files.metadata (described below)
  • DatasetTypeId: Identifier
  • Attributes: Read-only key-value pairs specific to each dataset type
Entities of type BioSample (v2 only)
Biosamples are a central reference entity in BSSH API v2. They are typically the starting point in an overall workflow of sequencing and analysis. Biosamples are biological samples that usually contains purified or un-purified nucleic acid. They are usually a sample from an individual, microbial strain, or mixture of individuals such as a soil sample. Other examples include a blood sample, saliva, a tumor biopsy, and purified DNA or RNA. Biosamples are the source material for creating Libraries, and a Biosample may have one or more Libraries. Each Library may have one or more FASTQ datasets, each of which are a product of sequencing a Library from the Biosample. Biosamples may have Prep Requests, which are requests to the lab for a specific yield of sequencing data. Datasets may be associated with a Biosample. These associations make it possible to view all data associated with a Biosample, such as on the Datasets tab of the Biosample Details page. Biosamples in BSSH do not contain files and are not stored in a Project (i.e. Biosamples are 'metadata-only'). However, they have a 'default Project' where new datasets are stored by default when using the Biosample manifest.
  • Datasets
  • AppSessions
  • Libraries
  • LabRequeues
  • PreRequests
  • RunLaneSummaries
  • Status (file)
  • DefaultProject
Entities of type Library (v2 only)
Libraries correspond to the physical libraries created in a lab. Libraries are prepared from a Biosample using a Library Prep Kit. Each Library is associated with one Biosample.
When a Library is sequenced on a Lane of a Run, the result is a FASTQ dataset. A Library may have multiple FASTQ datasets, which is the result of sequencing the Library on multiple Lanes and/or Runs.
Multiple Libraries that have indexes can be combined in a Pool. A Library may be present in multiple Pools.
These contain references to:
  • BioSample
  • LibraryPrep
  • Project
Entities of type PrepRequest (v2 only)
The PrepRequest entity represents a request to create libraries of a given type (represented by a LibraryPrep entity which denotes the prep kit or protocol used to create libraries) for a given BioSample. A single BioSample may have multiple active PrepRequest entities (to support cases like Moleculo), but most BioSamples will only have a single active PrepRequest which is considered the default library type for the BioSample in auto-app launch scenarios.
PrepRequest entities are created and associated with BioSamples during sample manifest upload (if the sample library prep is specified in the manifest), via the API/UI, or by interactions with LIMS.
The PrepRequest entity contains the following fields:
  • Id
  • State - Active or Cancelled
  • BioSample (via SampleSourceId) - Links to the owning BioSample
  • LibraryPrep - The type of library prep kit (or protocol) that is being requested
Entities of type LabRequeue (v2 only)
Lab Requeues are requests to the lab for additional sequencing data. They are requested once initial sequencing of a Biosample has not generated sufficient yield for a good Analysis. A typical example is when a Prep Request enters 'Missing Yield' status, so a Lab Requeue is requested.
Entities of type RunLaneSummary (v2 only)
No description yet. You can enjoy parsing the .json file of your entities.

BaseMount metadata files

In each directory, BaseMount provides a number of hidden files with extra BaseSpace Sequence Hub metadata. These hidden files follow the Unix convention of starting with a "." and can be seen using the command ls -a.
These metadata files are:
  • .href : the API entry point used to query this directory's contents
  • .curl : the API request, which could be used in standalone scripts
  • .json : the result of the BaseSpace Sequence Hub API query. This includes metadata associated with the relevant BaseSpace Sequence Hub entity. For example a sample's metadata includes the number and length of reads, the number of reads passing filter, etc.
  • .id : the id of the basespace entity, if it exists, extracted from .href
  • .type : the entity (or group thereof)'s type, extracted from .href
  • .name : the Name field of the current entity (when available from the json response)
  • .status: the Status field of the current entity (when available from the json response). Note that this file is never updated unless you run basemount-cmd refresh. Some entities also have a "Status" file that is more dynamically updated and is sometimes writeable.
Several entities also store metadata as properties. These are available in the Properties sub-directory.

Downloading Data

In appresult and sample directories, Files sub-directories expose data files, which can we copied or used interactively.
Just use cp, rsync, or any command line tool to copy the files from your BaseMount space to your chosen destination.
File accesses use direct HTTPS connections to Amazon S3 storage (i.e. they don't go through the BaseSpace Sequence Hub API) to expose files at the block level so users can access any part of a file without needing to download all the preceding blocks, such as when using a tabix or bam index. Caching makes repeated accesses to the same file more efficient.

Interactive access

Although BaseMount does facilitate file download, and since BaseMount allows for convenient, fast, cached access to your BaseSpace Sequence Hub metadata and files, you may find that many operations can be carried out without the need to download entire files locally.
During our testing, we have used BaseMount to grep through FASTQ files, extracted blocks of reads from BAM files and even used IGV on BAM files directly, all without downloading files locally. This can be more convenient than including a download step and saves on the overheads of local storage.

Multi-threaded download

BaseMount automatically uses multiple threads even for single file accesses. Copying multiple files in parallel may be slower or faster than one-at-a-time file copy: it will vary based on your system and on the way files were stored on Amazon servers.
Multi-threading can be configured by launching BaseMount with the command line options --threads=<n> and --cache-opts=<interactive block size>:<large block size>:<total cache size>
The default values are:
  • n=8: 8 concurrent threads, shared between all concurrently downloaded files
  • interactive block size=2: accesses to non-contiguous blocks in a file use 2MB-size blocks
  • large block size=16: contiguous accesses use 16MB-size blocks
  • total cache size=512: max 512MB of RAM is used to cache downloaded blocks, which is 32 large blocks.
Important: the number of large blocks able to be stored in the total cache size should be kept greater than the number of threads (even greater than 2x the number of threads to be conservative). Failing that, fast threads may get blocked by slow threads when the cache becomes full, reducing the overall download speed. In the worst case, some threads may even overwrite useful cached blocks before they have time to be used, forcing them to be downloaded again later.
Two timeouts are in place:
  • A speed-based timeout, when less than 1000 bytes are received in any 30-second window, to catch rare occurrences of broken connections to the S3 object store. The 30-second timeout is here to restart downloading the affected blocks on those occasions.
  • A fixed timeout of 300 seconds: when a block takes longer than this to be downloaded, BaseMount stops and resumes its download. This is intended to catch unexpected bandwidth throttling problems where one download thread becomes much slower than the other ones.
In both cases, an error message is issued to /tmp/basemount.errorlog and BaseMount resumes downloading the block 3 times before aborting.
Bandwidth requirements
These default values show that BaseMount will try to download 16MB blocks using 8 threads with a 300 seconds timeout: 16MB/thread * 8 threads * 8 bits/byte / 300s = 3.4 Mbps => If your connection is slower than that, you should reduce the number of threads or block size using the two above-mentioned parameters.
The Troubleshooting sections on timeout contain useful advice.
Unfortunately, the data downloaded from BaseSpace Sequence Hub is not currently validated (for example using MD5 sums). We hope to implement this soon, but for the moment, any data corruption occurring during download may go through BaseMount undetected.

BSFS plugin

A back-end with different caching strategies can be selected when launching BaseMount: the BSFS plugin
BSFS is optimising data accesses for large machines with good connection to S3, such as large EC2 instances. BSFS is the filesystem currently used by BaseSpace Sequence Hub when running native apps.
Without the BSFS plugin, caching is handled by the Linux disk cache, using (and restricted to) any available RAM. With BSFS, downloaded data is kept in a disk-based LRU (Least Recently Used) cache.
This plugin can be activated as follows:
basemount --plugin=bsfs <mountpoint>
"Appendix 1: BSFS" explains how to install and configure the BSFS plugin for optimal performance.


Next to each Files directory, you will see a special Files.metadata directory.
These directories contain "sneaky" sub-directories, which don't appear in ls results. Each file that you can see in a Files/ directory has a hidden matching directory entry in File.metadata/. For example, if Files/ contains a file Files/mySubDir/myFile, then you can do: cd Files.metadata/mySubDir/myFile to access this file's metadata. In there you will find the usual BaseMount metadata special files (such as .json), as well as a useful Content/Url file exposing the S3 URL of the file itself.
Note: This S3 URL is only valid for 7 days after it got generated by the API call used by BaseMount to create the Content/Url file.

Writing and Uploading data to BaseSpace Sequence Hub

Starting with version 0.12, BaseMount offers write mode capabilities.
This brings in the following 3 major and 2 minor features:
  • Creation of new BaseSpace Sequence Hub entities: Projects, AppResults
  • File upload to AppResults
  • Property editing
  • Marking AppSessions as Complete
  • Renaming AppSessions

Warning about access token scopes

Access tokens obtained through authentication are created with a specific set of scopes. Starting with version 0.11, the "CREATE GLOBAL" scope is requested. If you authenticated with an older version of BaseMount, your stored access token may not contain the scope needed for write-mode. To fix this, you need to delete your current configuration (by using basemount --remove-config [--config=<config>]) and run BaseMount again to re-authenticate.

Creation of new BaseSpace Sequence Hub entities

In any Projects or AppResults directory, the mkdir <name> command will create a new directory with the specified name.
A matching BaseSpace Sequence Hub entity is created on the BaseSpace Sequence Hub server.
  • Projects are always created in write mode
  • AppResults are created together with an associated AppSession called "BaseSpaceCLI - {creation time}". This appsession's status is set as "Running". Files can be added as long as the appsession stays in Running mode.

File upload to AppResults

After creating a new appresult entity with mkdir, enter the directory. You can copy any file (including any subdirectory hierarchy) to the Files directory.
The upload will automatically be multi-threaded, with a default of 8 threads per uploaded file (configurable with basemount --threads=<n> option - note that this command line parameter is also controlling the number of download threads. Expect this to change in a future version).
Note: Uploaded blocks are validated using their MD5 sums. (Note 2: Run upload is not supported yet) (Note 3: For Sample upload, please use the BaseSpaceCLI tools, which include FASTQ validation)

Property editing

A directory called "Properties" can be found under most BaseSpace Sequence Hub entity directories.
Special rules dictate who can edit those properties:
  • Only the owner of an entity can add/edit its properties
  • In the current implementation, only the access token used to create the entity can add/edit its properties
  • On the plus side, you can add or edit properties even after the entity (or its associated appsession) is marked as "Complete"
Properties can be of type string (manipulated in BaseMount as text files), array (subdirectory containing sequential entries called 0, 1, 2, etc.), map (subdirectory containing named entries), or reference to entities (symbolic link to entity directory).
For example:
# Add string property
echo "female" > gender
# Add Sample property
ln -s ../../../Samples/myExistingSample mySampleProperty
# ... or any path
ln -s <path to any BaseSpace Sequence Hub entity in BaseMount filesystem> myNiceEntity
The BaseSpace Sequence Hub Web interface is able to display a relationship between your appresult and an existing Sample. Such a relationship may be established by creating an input.sample property as follows:
<Enter the Properties directory of your new appresult>
# Add Sample property
ln -s <path to a BaseMount Sample directory> input.sample
Alternatively, you may create a property called input.samples with the type "array of samples":
<Enter the Properties directory of your new appresult>
# Add Sample[] property
mkdir input.samples
cd input.samples
ln -s <path to a BaseMount Sample directory> 0
ln -s <path to a BaseMount Sample directory> 1
ln -s <path to a BaseMount Sample directory> 2

Marking AppSessions as Complete

Inside an AppResult directory, run basemount-cmd mark-as-complete to change the status of the associated appsession to "Complete".
The directory will become read-only. In case of failure, an explicit error message is issued.

Renaming AppSessions

When creating an appresult, BaseSpace Sequence Hub automatically creates an associated appsession and automatically allocates it the name "BaseSpaceCLI - {creation time}". If you wish to choose a different name to organise your workspace more effectively, you can use the mv command in the AppSessions directory to rename appsessions.

Deleting data

Starting with version 0.14, BaseMount can move data to and from the trash.

Warning about access token scopes

Access tokens obtained through authentication are created with a specific set of scopes. Starting with version 0.14, the "MOVETOTRASH GLOBAL" scope is requested. If you authenticated with an older version of BaseMount, your stored access token may not contain the scope needed for trash operations. To fix this, you need to delete your current configuration (by using basemount --remove-config [--config=<config>]) and run BaseMount again to re-authenticate.

Move to trash

You can delete BaseSpace Sequence Hub entities with any of these methods:
  1. 1.
    basemount-cmd --path <entity directory> move-to-trash. In case of error (e.g. lack of permissions, app still running, etc.), the tool will report the error and exit with error code 1.
  2. 2.
    cd into the entity's directory and run basemount-cmd move-to-trash. Warning: In case of success, the current directory will become invalid, as the entity will have been deleted.
  3. 3.
    cd into the entity's parent directory and run rmdir <entity name>. In case of error, the entity won't be deleted, and the error message will be added to the .error file in the current directory.

Move base call Run files to trash

With runs, users often wish to delete the large amount of base call data while retaining the BaseSpace entries and small files such as metrics and monitoring files.
This is achieved with basemount-cmd move-to-trash-preserve-metadata, which deletes only the Data directory from the run, preserving the entity and the other files.

Restore from trash

The directory <mount point>/.Trash (note the dot) contains the list of items stored in your trash. In order to restore one of these items, run: basemount-cmd --path <path_of_item_in_trash> restore-from-trash.
Note-warning: If you use this command without --path, by first entering the trash item directory, the current directory will become invalid as soon as the entity gets restored, as it won't be in the trash anymore.
Note: If the restored entity's parent directory had previously been accessed, a manual refresh may be needed to see the restored entity: basemount-cmd --path <restored entity's parent path> refresh

Trash item types

Each item in BaseMount's .Trash directory is itself a directory that contains the usual .json and other metadata files, giving you some information about the deleted item.
A text file called TrashItemType exposes a string of the form <Type> (<includes>), where:
  • <Type> takes values such as "DeletedProject", "DeletedRun", "DeletedAppSession", etc.
  • <includes> is a list of '+'-separated entries as returned by the API, currently either "FILEDATA+METADATA" for items that have been deleted with move-to-trash, and just "FILEDATA" for items that have been deleted with move-to-trash-preserve-metadata

Protection against rm -rf

Don't use rm -rf to delete a BaseSpace Sequence Hub entity, as it could delete its properties before moving the entity itself to the trash. As a safeguard, any attempt to delete an invalid item (such as the Projects directory or a .json file) blocks any other deletion for 5 seconds. As rm -rf usually starts with such invalid items, it should block itself before deleting any data.

Emptying the trash

We keep this feature hard to access to prevent an accidental loss of data. In order to do so, you will need an access token with the "EMPTY TRASH" scope, and you will need to issue the DELETE v1pre3/users/current/trash API call yourself (for example with curl).
Please contact our support team if needed.

BaseMount with v2 API

BaseSpace Sequence Hub has two parallel API versions, v1 and v2. The main differences are that v1 exposes Samples (collections of FASTQ files) and AppResults (generic collections of files from apps). v2 contains Datasets (any collection of files) and Biosamples (sample metadata and pointers to FASTQ Datasets). AppResults and Samples that were created with the v1 API are transparently exposed as v2 Datasets and BioSamples, which can then be augmented with some new metadata, attributes, etc.
In order to use the v2 API with BaseMount you can launch:
basemount --use-v2-api

The basemount-cmd tool

Running basemount-cmd (or its shorter alias bm-cmd) displays the list of available commands. This list of commands will vary based on your current directory, for example mark-as-complete only appears for AppResults that are not yet in status==Complete, whereas the refresh command appears in most directories.
The basemount-cmd tool was introduced to:
  • Enable bi-directional communication between the user and BaseMount (typical filesystem commands such as cat, echo, etc. only read or write to a file, but can't return explicit information or error messages conveniently)
  • Give a higher level of abstraction for certain commands, allowing to group multiple filesystem commands, which are intentionally kept at the same granularity as the BaseSpace Sequence Hub API
  • Give access to bash-completion (the ability to see the list of commands by typing basemount-cmd <TAB><TAB>)
The current version of basemount-cmd is the first version and is still experimental.

How it works

In a selected subset of BaseMount directories, extra commands are available, which you can call using
basemount-cmd [--path <path to apply command>] <command>
Typing basemount-cmd <TAB><TAB> displays the list of available commands. Running basemount-cmd without arguments also shows a description for each command.

Description of commands

  • refresh: Refresh this directory and sub-directories Available in: all entities and lists thereof
  • move-to-trash: Delete current entity (Warning: current directory will become invalid) Available in: project, run, sample, appresult and appsession entities
  • move-to-trash-preserve-metadata: Delete only the Data directory from the run, preserving the entity and the other files Available in: run entities
  • restore-from-trash: Restore entity to main account - Warning: current directory disappears Available in: .Trash/{entity-name} directories
  • mark-as-complete: Finalize and switch (sample or appresult) to read-only. Change the state of the associated appsession to Complete Available in: sample, appresult and appsession entities
  • rename-appsession: Rename appsession associated to the appresult Available in: appresult entities
  • relaunch: Relaunch current appsession by running the equivalent of cat LaunchPayload > Application/.AppLauncher (see show-launch-files below) Available in: appsession entities
  • show-launch-files: Expose LaunchPayload and LaunchApp files Available in: appsession entities
  • unmount: Force unmount. Warning: the current directory will become invalid Available in: top level directory
Note: Commands that are not listed here are "use at your own risk", and may disappear in future versions (well... those listed here may disappear as well anyway... it's an alpha release after all).

Limitations of BaseMount Alpha

Each new directory access made by BaseMount relies on the local FUSE device (/dev/fuse), the BaseSpace Sequence Hub API and the user's credentials. This mechanism means that, as currently available, BaseMount does not support the following types of accesses:
  • Cluster access, where many compute nodes can access the files. FUSE-mounted filesystems are per-host and cannot be accessed from other hosts without additional infrastructure.
  • In general, lots of concurrent requests can cause stability issues on a resource-constrained system. Keep in mind, this is an early release and stability will increase.
  • If you have terabytes of data in BaseSpace, doing a "find" command or recursive "ls" or recursive "grep" on the mount is not recommended: it is likely to start consuming many GB of RAM and may crash when the memory runs out.

Appendix 1: The BSFS plugin

The BSFS plugin is a high performance FUSE-based plugin for BaseMount.

BSFS installation

# Add BSFS repository
echo "# Packages by Illumina BaseSpace" | sudo tee ${FILE}
echo "deb saucy main" | sudo tee -a ${FILE}
# Import BaseSpace Sequence Hub GPG key
curl | sudo apt-key add -
# Refresh db and install BSFS
sudo apt-get update
sudo apt-get install bsfs
# Add BSFS repository
echo "# Packages by Illumina BaseSpace" | sudo tee ${FILE}
echo "[bintraybintray-basespace-BaseSpaceFS-RPM]" | sudo tee -a ${FILE}
echo "name=bintray-basespace-BaseSpaceFS-RPM" | sudo tee -a ${FILE}
echo "baseurl=" | sudo tee -a ${FILE}
echo "gpgcheck=1" | sudo tee -a ${FILE}
echo "enabled=1" | sudo tee -a ${FILE}
# Import BaseSpace Sequence Hub GPG key
sudo rpm --import
# Install BSFS
sudo yum install bsfs

Using BSFS with BaseMount

This plugin can be activated as follows:
basemount --plugin=bsfs <mountpoint>

Differences in behavior when using the BSFS plugin

  • A new directory /tmp/BsfsMount is created (the location can be configured), and will contain BSFS mount points.
  • The "Files" directories containing your actual files are symlinks to these folders in /tmp. If, for example, you wish to tar the files, you may need to use the "h" option to follow the symlinks.
  • BSFS does not support proxies
  • BSFS may behave less interactively than the default back-end when accessing parts of files as BSFS uses larger blocks by default
  • BSFS does not show AppSessionLogs

BSFS options

To optimize the BSFS plugin's behavior, the following values can be set in the configuration file under a [BSFS] tag. Configuration options are case-sensitive.
LRU Cache Mode
There are currently three caching methods implemented in BSFS as described below:
SparseFile: This mode is used in EC2 instances, where the cache path is on an XFS file system. If this mode is specified and the cache path is not on XFS, then BlockFile mode will be used automatically. Ejection of blocks from cache will use the hole-punching feature of XFS.
cacheMode = SparseFile
BlockFile: This mode can be used with any file system. In this mode, each file-block is represented as a separate file in the cache folder. If lots of blocks are being accessed, then the number of open file limit (ulimit -n) of the OS will be reached. Therefore it is recommended to increase this value to more than 20,000.
cacheMode = BlockFile
RamBuffer (default): In this mode, the cached data is kept in RAM, and not on disk. You should ensure the cache size limit fits in your available RAM, and adjust your block size to have enough blocks if you plan to access multiple files in parallel.
cacheMode = RamBuffer
Prefetch Strategy
NextBlock (default): In addition to the block(s) containing the current request the "next block" in sequence is also fetched. In our production EC2 instances, this mode is used.
prefetchStrategy = NextBlock
OnDemand: Only the block(s) containing the current read request is fetched, i.e. pre-fetching of blocks are not done.
prefetchStrategy = OnDemand
Block Size
This represents the maximum size (in bytes) of a file block in BSFS's cache. The value you set gets rounded up to a multiple of the filesystem's block size. In our production EC2, block size of 5242880 (5MB) is used.
blockSize = 5242880
Cache Folder
This is the location on disk (not needed for RamBuffer mode) where the parts/blocks of the accessed files are cached. In our production EC2, this is on a 1TB XFS file system.
cacheFolder = /tmp/bsfsCache
Cache Size Limit
This represents the maximum size (in bytes) of the disk (or RAM in case of RamBuffer mode) which can be used for buffering the data. At runtime, if the cache size limit is reached, then some (least recently used) file blocks are purged to make space for the new incoming blocks. Ensure that the disk where you specify the cache folder location has enough free space. For example, the following line will configure a 100 GB cache.
cacheSizeLimit = 107374182400
Number of Blocks to Purge on Cache Full
This represents the number of file-blocks to purge when the LRU cache becomes full. In our production EC2, this value is set to 20.
numberOfBlocksToPurgeOnCacheFull = 20

Example of BSFS plugin configuration