Introduction to Basemount

Introduction to Basemount

Introduction To BaseMount

Table of Contents

Overview

The main mechanism to interact with your BaseSpace Sequence Hub (BSSH) data is via the website at basespace.illumina.com. However, for some use-cases, it can be useful to work with the same data using the Linux command line interface (CLI). This allows direct ad-hoc programmatic access so that users can write ad-hoc scripts and use tools like find, xargs and command line loops to work with their data in bulk.

This is the concept behind our BaseSpace Sequence Hub command line tool, BaseMount, a FUSE driver that allows command line access to your BaseSpace Sequence Hub data.

What is BaseMount?

BaseMount is a tool to mount your BaseSpace Sequence Hub data as a Linux file system. You can navigate through projects, samples, runs and app results and interact directly with the associated files exactly as you would with any other local file system. BaseMount is based on FUSE and uses the BaseSpace Sequence Hub API to populate the contents of each directory.

Tutorial Videos

This video series starts by preparing you for your first mount and ends with filtering samples from the command line based on their metadata.

The script of each video is included here for quick reference.

Step 1 - Install

Step 2 - Import Data to Account

{% embed url="https://youtu.be/R8ou0yLE_Ts" %}

`

`` In a browser, starting from basespace.illumina.com - Log in - Add public dataset (2 runs + 1 project): "HiSeq X Ten TruSeq PCR Free (16 NA12878 1 plex)" (Use the following import links to avoid going through the web GUI) - Run 1 : https://basespace.illumina.com/s/0FCHjcXGsMMX - Run 2 : https://basespace.illumina.com/s/EXDT8tjo6Zbc - Project: https://basespace.illumina.com/s/mYwAqCV3Pe7R

sudo bash -c "$(curl -L https://basemount.basespace.illumina.com/uninstall)"

basemount [--config ]

mkdir ~/BaseSpace_Mount basemount --config user1 ~/BaseSpace_Mount

basemount [--config ] --api-server <API URL or alias | help>

basemount --plugin=bsfs

Add string property

echo "female" > gender

Add Sample property

ln -s ../../../Samples/myExistingSample mySampleProperty

... or any path

ln -s myNiceEntity

# Add Sample property ln -s input.sample ```

Alternatively, you may create a property called input.samples with the type "array of samples":

Marking AppSessions as Complete

Inside an AppResult directory, run basemount-cmd mark-as-complete to change the status of the associated appsession to "Complete".

The directory will become read-only. In case of failure, an explicit error message is issued.

Renaming AppSessions

When creating an appresult, BaseSpace Sequence Hub automatically creates an associated appsession and automatically allocates it the name "BaseSpaceCLI - {creation time}". If you wish to choose a different name to organise your workspace more effectively, you can use the mv command in the AppSessions directory to rename appsessions.

Deleting data

Starting with version 0.14, BaseMount can move data to and from the trash.

Warning about access token scopes

Access tokens obtained through authentication are created with a specific set of scopes. Starting with version 0.14, the "MOVETOTRASH GLOBAL" scope is requested. If you authenticated with an older version of BaseMount, your stored access token may not contain the scope needed for trash operations. To fix this, you need to delete your current configuration (by using basemount --remove-config [--config=<config>]) and run BaseMount again to re-authenticate.

Move to trash

You can delete BaseSpace Sequence Hub entities with any of these methods:

  1. basemount-cmd --path <entity directory> move-to-trash. In case of error (e.g. lack of permissions, app still running, etc.), the tool will report the error and exit with error code 1.

  2. cd into the entity's directory and run basemount-cmd move-to-trash. Warning: In case of success, the current directory will become invalid, as the entity will have been deleted.

  3. cd into the entity's parent directory and run rmdir <entity name>. In case of error, the entity won't be deleted, and the error message will be added to the .error file in the current directory.

Move base call Run files to trash

With runs, users often wish to delete the large amount of base call data while retaining the BaseSpace entries and small files such as metrics and monitoring files.

This is achieved with basemount-cmd move-to-trash-preserve-metadata, which deletes only the Data directory from the run, preserving the entity and the other files.

Restore from trash

The directory <mount point>/.Trash (note the dot) contains the list of items stored in your trash. In order to restore one of these items, run: basemount-cmd --path <path_of_item_in_trash> restore-from-trash.

Note-warning: If you use this command without --path, by first entering the trash item directory, the current directory will become invalid as soon as the entity gets restored, as it won't be in the trash anymore.

Note: If the restored entity's parent directory had previously been accessed, a manual refresh may be needed to see the restored entity: basemount-cmd --path <restored entity's parent path> refresh

Trash item types

Each item in BaseMount's .Trash directory is itself a directory that contains the usual .json and other metadata files, giving you some information about the deleted item.

A text file called TrashItemType exposes a string of the form <Type> (<includes>), where:

  • <Type> takes values such as "DeletedProject", "DeletedRun", "DeletedAppSession", etc.

  • <includes> is a list of '+'-separated entries as returned by the API, currently either "FILEDATA+METADATA" for items that have been deleted with move-to-trash, and just "FILEDATA" for items that have been deleted with move-to-trash-preserve-metadata

Protection against rm -rf

Don't use rm -rf to delete a BaseSpace Sequence Hub entity, as it could delete its properties before moving the entity itself to the trash. As a safeguard, any attempt to delete an invalid item (such as the Projects directory or a .json file) blocks any other deletion for 5 seconds. As rm -rf usually starts with such invalid items, it should block itself before deleting any data.

Emptying the trash

We keep this feature hard to access to prevent an accidental loss of data. In order to do so, you will need an access token with the "EMPTY TRASH" scope, and you will need to issue the DELETE v1pre3/users/current/trash API call yourself (for example with curl).

Please contact our support team if needed.

BaseMount with v2 API

BaseSpace Sequence Hub has two parallel API versions, v1 and v2. The main differences are that v1 exposes Samples (collections of FASTQ files) and AppResults (generic collections of files from apps). v2 contains Datasets (any collection of files) and Biosamples (sample metadata and pointers to FASTQ Datasets). AppResults and Samples that were created with the v1 API are transparently exposed as v2 Datasets and BioSamples, which can then be augmented with some new metadata, attributes, etc.

In order to use the v2 API with BaseMount you can launch:

basemount --use-v2-api

The basemount-cmd tool

Running basemount-cmd (or its shorter alias bm-cmd) displays the list of available commands. This list of commands will vary based on your current directory, for example mark-as-complete only appears for AppResults that are not yet in status==Complete, whereas the refresh command appears in most directories.

The basemount-cmd tool was introduced to:

  • Enable bi-directional communication between the user and BaseMount (typical filesystem commands such as cat, echo, etc. only read or write to a file, but can't return explicit information or error messages conveniently)

  • Give a higher level of abstraction for certain commands, allowing to group multiple filesystem commands, which are intentionally kept at the same granularity as the BaseSpace Sequence Hub API

  • Give access to bash-completion (the ability to see the list of commands by typing basemount-cmd <TAB><TAB>)

The current version of basemount-cmd is the first version and is still experimental.

How it works

In a selected subset of BaseMount directories, extra commands are available, which you can call using

Typing basemount-cmd <TAB><TAB> displays the list of available commands. Running basemount-cmd without arguments also shows a description for each command.

Description of commands

  • refresh: Refresh this directory and sub-directories Available in: all entities and lists thereof

  • move-to-trash: Delete current entity (Warning: current directory will become invalid) Available in: project, run, sample, appresult and appsession entities

  • move-to-trash-preserve-metadata: Delete only the Data directory from the run, preserving the entity and the other files Available in: run entities

  • restore-from-trash: Restore entity to main account - Warning: current directory disappears Available in: .Trash/{entity-name} directories

  • mark-as-complete: Finalize and switch (sample or appresult) to read-only. Change the state of the associated appsession to Complete Available in: sample, appresult and appsession entities

  • rename-appsession: Rename appsession associated to the appresult Available in: appresult entities

  • relaunch: Relaunch current appsession by running the equivalent of cat LaunchPayload > Application/.AppLauncher (see show-launch-files below) Available in: appsession entities

  • show-launch-files: Expose LaunchPayload and LaunchApp files Available in: appsession entities

  • unmount: Force unmount. Warning: the current directory will become invalid Available in: top level directory

Note: Commands that are not listed here are "use at your own risk", and may disappear in future versions (well... those listed here may disappear as well anyway... it's an alpha release after all).

Limitations of BaseMount Alpha

Each new directory access made by BaseMount relies on the local FUSE device (/dev/fuse), the BaseSpace Sequence Hub API and the user's credentials. This mechanism means that, as currently available, BaseMount does not support the following types of accesses:

  • Cluster access, where many compute nodes can access the files. FUSE-mounted filesystems are per-host and cannot be accessed from other hosts without additional infrastructure.

  • In general, lots of concurrent requests can cause stability issues on a resource-constrained system. Keep in mind, this is an early release and stability will increase.

  • If you have terabytes of data in BaseSpace, doing a "find" command or recursive "ls" or recursive "grep" on the mount is not recommended: it is likely to start consuming many GB of RAM and may crash when the memory runs out.

Troubleshooting

Generic BaseMount debugging

When experiencing problems with BaseMount, the following actions can help identify the root cause:

  • Check .error files, created in the directory where the error occurred. Disadvantage: multiple errors in the same directory overwrite each other.

  • Check latest entries from /tmp/basemount.errorlog . Critical errors and crash stack traces (very useful to developers, in case you plan to report the problem) are reported there.

  • Re-launch Basemount with the -f option (basemount -f ...) to keep BaseMount in the foreground and give you a (very!) comprehensive log output. Disadvantage: it keeps one terminal busy and slows BaseMount down.

Error: "Bad address"

As BaseMount is exposed as a filesystem, it has the inconvenience of only being accessible by unidirectional commands: either you read from a file or you write to a file. Commands operating on files (such as echo, cat, cp, etc.) don't have the ability to return a personalized error message from the filesystem driver to the user. When an error occurs, BaseMount returns a generic error code (usually translated as "Bad address") and stores a more comprehensive error message in a file called .error, created in the directory where the error occurred. Very important errors are also logged in the /tmp/basemount.errorlog file.

Follow "Generic BaseMount debugging" to figure out what they mean.

Error: "Failed to open mountpoint for reading: Permission denied"

You don't have root access to the directory where you are creating your mount point. Some file systems may be configured to not allow root access by default.

Error: "Timeout was reached - Shared error buffer: Operation too slow. Less than 1000 bytes/sec transfered [in] the last 30 seconds"

This error message is reported in the /tmp/basemount.errorlog file and is related to file (from Files directories) download.

Rare occurrences (fewer than once per 100 GB) of this message can be ignored, as connections to the S3 object store seem to break from time to time. The 30-second timeout is here to restart downloading the affected blocks on those occasions.

Regular occurrences, on the other hand, are important to address, and usually indicate a connection to S3 worse than our expected worst case, or a deeper problem with the stability of your internet connection.

If you believe the problem comes from BaseMount, the answer to the next question describes command line parameters that may ease the bandwidth/latency requirements.

Error: "Timeout was reached - Shared error buffer: Operation timed out after 300000 milliseconds with 8668643 out of 16777216 bytes received"

This error message is reported in the /tmp/basemount.errorlog file and is related to file (from Files directories) downloads.

By default, BaseMount tries to download 16MB blocks using 8 threads. If a block takes more than 300 seconds to arrive, it issues this error message (and tries to resume the download 3 times before aborting).

16MB/thread * 8 threads * 8 bits/byte / 300s = 3.4 Mbps.

If your connection is slower than that, two BaseMount options may help address this problem:

  • --threads=<n> sets the number of concurrent download threads

By default, n=8, so --threads=2 may help.

But... in many settings, download speed is improved by using multiple threads. In this case, reducing the size of each downloaded block to something smaller than 16MB may be a good option:

  • --cache-opts=<interactive block size>:<large block size>:<total cache size>

Default value, in MB, are: 2:16:512 (Note: interactive block size is used when files are accessed in a non-sequential manner).

For example --cache-opts=2:4:512 would make BaseMount download 4MB blocks.

A middle ground can be achieved with both options: --threads=4 --cache-opt=2:8:512

Questions and Answers

How can I refresh the contents of BaseMount's directories?

You can run either:

  • basemount-cmd refresh

  • echo refresh > .commands

Can I check which permissions/scopes my access token has?

Yes, you can see this as part of the following json response: cat <mount point>/.AccessToken/.json

How can I install a previous version of Basemount?

The install script always installs the latest version of BaseMount. If you want to lock in a specific version as part of your system setup scripts, please use the following steps.

  • Configure the BaseMount repository as described in the "BaseMount Installation, Manual Install" section, but without the final apt-get/yum install basemount command

  • Optional: Configure the BSFS repository as described in the "Appendix 1 - BSFS installation" section, but without the final apt-get/yum install bsfs command

  • Install BaseMount using the following commands:

v0.1 Alpha

Ubuntu

CentOS

v0.11 Alpha

Ubuntu

CentOS

Package version discovery

You can discover which versions of BaseMount are available with the following commands:

Ubuntu

CentOS

ChangeLog

Tue Jun 21 2016 - v0.14 Alpha

  • Refresh command

  • Move-to-trash and restore-from-trash

  • New bm-cmd tool, a shorter alias for basemount-cmd

  • Moved passphrase encryption to --passphrase

Tue Mar 1 2016 - v0.12 Alpha

  • Write-mode: project and appresult creation, file upload

  • Properties can be viewed and edited

  • Improved documentation

  • Relaxed timeout for low bandwidth

  • Offers to create mount point when it doesn't exist

  • Offers to unmount if mount point refers to a mounted path

  • Unmount assistance, listing blocking processes and offering lazy-unmount

  • Raised RAM requirements to 1.5GB, allowing to proceed after confirmation prompt

Tue Jan 26 2016 - v0.11 Alpha

  • Proxy support

  • "Files" directories are not symlinks anymore

  • Run Files are automatically mounted, now that they load more interactively

  • .basemount directories for BaseSpaceCLI support

  • BSFS available as a plugin

Fri Jul 24 2015 - v0.1 Alpha

  • Initial release

Last updated

Was this helpful?