Nanopore sequencing data analysis using Microsoft Azure cloud computing service
Linh Truong
Abstract
This protocol provides instruction to set up the analytic pipeline to process raw data from Oxford Nanopore Sequencing. This pipeline leverages the computing resources available in Microsoft Azure cloud server as well as hospital site at Fiona Stanley Hospital. The raw data in FAST5 format would be converted to FASTQ format, demultiplexed, renamed to appropriate sample ID and filtered based on pre-determined quality threshold. The QC plots would also be generated for ongoing monitoring purposes of sequencing output and quality. The entire data flow from the hospital premise to the cloud and vice versa is completely automated and secured.
Steps
Section 1: Generation of data on-site
Load the multiplexed HLA library pool consisting of 48 individuals onto a MinION flow cell. The data is acquired using MinKNOW software for 16 hours using default settings.
Equipment
Value | Label |
---|---|
MinION | NAME |
Sequencer | TYPE |
Oxford Nanopore Technologies | BRAND |
MinION 1B / MinION 1C | SKU |
https://nanoporetech.com/ | LINK |
16h 0m 0s
The raw FAST5 files are stored in a local folder on the MinION-connected PC.
Equipment
Value | Label |
---|---|
MinION-connected PC | NAME |
Computer | TYPE |
Dell | BRAND |
N/A | SKU |
https://www.dell.com/pt-br/shop/lp/notebooks-desktops-escolha-sua-loja?gacd=9657105-15013-5761040-275878141-0&dgc=ST&cid=71700000068833255&gclid=CjwKCAjw87SHBhBiEiwAukSeUfqxItHp4Udl57DAJRzEEySpq5rR4mrqqzGKHW-UZ12RJ6iMGJnp6hoCNIIQAvD_BwE&gclsrc=aw.ds&nclid=BBedBAWlwBXTpy_PAx4QTZin0cycVC4LiTtAxfH7BnIo5MW1lXLhypxx5CxtD032 | LINK |
Intel® Core™ i&-7700K CPU @ 4.20Ghz, 32 GB RAM, 64-bit operating system and GPU driver NVIDIA GTX 1080 Ti | SPECIFICATIONS |
An automation agent for Loome Integrate runs on the MinION-connected PC and checks for new FAST5 files every 30 minutes.
Software
Value | Label |
---|---|
Loome Integrate | NAME |
BizData Pty Ltd | DEVELOPER |
https://www.loomesoftware.com | LINK |
Section 2: Data migration to Microsoft Azure
The input files are automatically uploaded by the Loome Integrate agent into a container in an Azure blob storage account, deployed within the PathWest Azure subscription. The files are uploaded using Transport Layer Security (TLS), and are encrypted at rest using 256-bit AES encryption.
#AzCopy upload
azcopy copy <local_folder> <remote_container> --recursive
Section 3: Orchestration of analysis pipeline in Microsoft Azure
The Loome Integrate agent detects that the sequencing job has been completed when it finds a file named "final_summary_
Software
Value | Label |
---|---|
Loome Integrate | NAME |
BizData Pty Ltd | DEVELOPER |
https://www.loomesoftware.com | LINK |
Loome communicates with the Azure Batch service and tells it to run the analysis using a Docker container that is automatically pulled by Azure Batch from a private Azure Container Registry in PathWest's Azure subscription.
Software
Value | Label |
---|---|
Azure Batch service | NAME |
Microsoft | DEVELOPER |
https://azure.microsoft.com/products/batch | LINK |
Section 4: Workflow in the cloud server
Azure Batch automatically deploys a GPU-enabled Virtual Machine (VM) for basecalling, de-multiplexing, quality trimming and QC overview using the following commands.
#Guppy basecaller
guppy_basecaller --input_path XX --save_path XX --flowcell FLO-MIN111 --kit SQK-109 --device cuda:0
````1h 7m 10s`
#Guppy barcoder guppy_barcoder --input_path XX --save_path XX --config configuration.cfg --device cuda:0 --records_per_fastq 0 --trim_barcodes ````0h 3m 6s`
#Concatenate & rename file
cd /each_barcode_folder
cat *.fastq > barcodeXX.fastq
````0h 0m 30s`
#NanoFilt cat barcodeXX.fastq | NanoFilt –q 7 –l 500 > barcodeXX_sampleID.fastq ````0h 5m 39s`
When each of the VMs was running, the input data is copied into their local disk for faster processing, run the analyses, and then copied the results back into blob storage so that the VMs could be deleted when processing had been completed. Loome Integrate, in coordination with Azure Batch, orchestrates these steps.
#AzCopy download
azcopy copy <remote_container> <local_folder> --recursive
#AzCopy upload
azcopy copy <local_folder> <remote_container> --recursive
Loome Integrate detects the completion of all tasks in the Azure Batch job and sends an email to notify that the analysis has been successfully completed or to report an error.
Software
Value | Label |
---|---|
Loome Integrate | NAME |
BizData Pty Ltd | DEVELOPER |
https://www.loomesoftware.com | LINK |
Section 5: Data migration from Microsoft Azure server
If the analysis has been successfully completed, the Loome Integrate agent downloads the results in FASTQ format into the MinION-connected PC.
Software
Value | Label |
---|---|
Loome Integrate | NAME |
BizData Pty Ltd | DEVELOPER |
https://www.loomesoftware.com | LINK |
#AzCopy download
azcopy copy <remote_container> <local_folder> --recursive
Section 6: Final analysis of results
The demultiplexed FASTQ file is analysed by a commercial HLA allele assignment software, GenDX NGSengine.
Software
Value | Label |
---|---|
NGSengine | NAME |
GenDX | DEVELOPER |
https://www.gendx.com/product_line/ngsengine/ | LINK |
The HLA alleles are curated by laboratory staff for accuracy and suitability to reporting.