Nanopore sequencing data analysis using Microsoft Azure cloud computing service

Linh Truong

Published: 2022-10-20 DOI: 10.17504/protocols.io.x54v9dj7pg3e/v1

Abstract

This protocol provides instruction to set up the analytic pipeline to process raw data from Oxford Nanopore Sequencing. This pipeline leverages the computing resources available in Microsoft Azure cloud server as well as hospital site at Fiona Stanley Hospital. The raw data in FAST5 format would be converted to FASTQ format, demultiplexed, renamed to appropriate sample ID and filtered based on pre-determined quality threshold. The QC plots would also be generated for ongoing monitoring purposes of sequencing output and quality. The entire data flow from the hospital premise to the cloud and vice versa is completely automated and secured.

Steps

Section 1: Generation of data on-site

1.

Load the multiplexed HLA library pool consisting of 48 individuals onto a MinION flow cell. The data is acquired using MinKNOW software for 16 hours using default settings.

Equipment

ValueLabel
MinIONNAME
SequencerTYPE
Oxford Nanopore TechnologiesBRAND
MinION 1B / MinION 1CSKU
https://nanoporetech.com/LINK

16h 0m 0s

2.

The raw FAST5 files are stored in a local folder on the MinION-connected PC.

Equipment

ValueLabel
MinION-connected PCNAME
ComputerTYPE
DellBRAND
N/ASKU
https://www.dell.com/pt-br/shop/lp/notebooks-desktops-escolha-sua-loja?gacd=9657105-15013-5761040-275878141-0&dgc=ST&cid=71700000068833255&gclid=CjwKCAjw87SHBhBiEiwAukSeUfqxItHp4Udl57DAJRzEEySpq5rR4mrqqzGKHW-UZ12RJ6iMGJnp6hoCNIIQAvD_BwE&gclsrc=aw.ds&nclid=BBedBAWlwBXTpy_PAx4QTZin0cycVC4LiTtAxfH7BnIo5MW1lXLhypxx5CxtD032LINK
Intel® Core™ i&-7700K CPU @ 4.20Ghz, 32 GB RAM, 64-bit operating system and GPU driver NVIDIA GTX 1080 TiSPECIFICATIONS
3.

An automation agent for Loome Integrate runs on the MinION-connected PC and checks for new FAST5 files every 30 minutes.

Software

ValueLabel
Loome IntegrateNAME
BizData Pty LtdDEVELOPER
https://www.loomesoftware.comLINK

Section 2: Data migration to Microsoft Azure

4.

The input files are automatically uploaded by the Loome Integrate agent into a container in an Azure blob storage account, deployed within the PathWest Azure subscription. The files are uploaded using Transport Layer Security (TLS), and are encrypted at rest using 256-bit AES encryption.

#AzCopy upload 
azcopy copy <local_folder> <remote_container> --recursive

Section 3: Orchestration of analysis pipeline in Microsoft Azure

5.

The Loome Integrate agent detects that the sequencing job has been completed when it finds a file named "final_summary_.txt", and then triggers a new job to deploy the necessary resources and to start the processing steps using the Azure Batch service.

Software

ValueLabel
Loome IntegrateNAME
BizData Pty LtdDEVELOPER
https://www.loomesoftware.comLINK
6.

Loome communicates with the Azure Batch service and tells it to run the analysis using a Docker container that is automatically pulled by Azure Batch from a private Azure Container Registry in PathWest's Azure subscription.

Software

ValueLabel
Azure Batch serviceNAME
MicrosoftDEVELOPER
https://azure.microsoft.com/products/batchLINK

Section 4: Workflow in the cloud server

7.

Azure Batch automatically deploys a GPU-enabled Virtual Machine (VM) for basecalling, de-multiplexing, quality trimming and QC overview using the following commands.

#Guppy basecaller 
guppy_basecaller --input_path XX --save_path XX --flowcell FLO-MIN111 --kit SQK-109 --device cuda:0
````1h 7m 10s` 






#Guppy barcoder guppy_barcoder --input_path XX --save_path XX --config configuration.cfg --device cuda:0 --records_per_fastq 0 --trim_barcodes ````0h 3m 6s`

#Concatenate & rename file 
cd /each_barcode_folder
cat *.fastq > barcodeXX.fastq
````0h 0m 30s` 






#NanoFilt cat barcodeXX.fastq | NanoFilt –q 7 –l 500 > barcodeXX_sampleID.fastq ````0h 5m 39s`

8.

When each of the VMs was running, the input data is copied into their local disk for faster processing, run the analyses, and then copied the results back into blob storage so that the VMs could be deleted when processing had been completed. Loome Integrate, in coordination with Azure Batch, orchestrates these steps.

#AzCopy download 
azcopy copy <remote_container> <local_folder> --recursive
#AzCopy upload 
azcopy copy <local_folder> <remote_container> --recursive
9.

Loome Integrate detects the completion of all tasks in the Azure Batch job and sends an email to notify that the analysis has been successfully completed or to report an error.

Software

ValueLabel
Loome IntegrateNAME
BizData Pty LtdDEVELOPER
https://www.loomesoftware.comLINK

Section 5: Data migration from Microsoft Azure server

10.

If the analysis has been successfully completed, the Loome Integrate agent downloads the results in FASTQ format into the MinION-connected PC.

Software

ValueLabel
Loome IntegrateNAME
BizData Pty LtdDEVELOPER
https://www.loomesoftware.comLINK
#AzCopy download 
azcopy copy <remote_container> <local_folder> --recursive

Section 6: Final analysis of results

11.

The demultiplexed FASTQ file is analysed by a commercial HLA allele assignment software, GenDX NGSengine.

Software

ValueLabel
NGSengineNAME
GenDXDEVELOPER
https://www.gendx.com/product_line/ngsengine/LINK

The HLA alleles are curated by laboratory staff for accuracy and suitability to reporting.

推荐阅读

Nature Protocols
Protocols IO
Current Protocols
扫码咨询