Getting Started with LINCS Datasets and Tools

Eryk Kropiwnicki, Eryk Kropiwnicki, Alexander Lachmann, Daniel J. B. Clarke, Daniel J. B. Clarke, Avi Ma'ayan, Zhuorui Xie, Zhuorui Xie, Megan L. Wojciechowicz, Megan L. Wojciechowicz, Kathleen M. Jagodnik, Kathleen M. Jagodnik, Ingrid Shu, Ingrid Shu, Allison Bailey, Allison Bailey, Minji Jeon, Minji Jeon, John Erol Evangelista, John Erol Evangelista, Maxim V. Kuleshov, Maxim V. Kuleshov, Abhijna A. Parigi, Jose M. Sanchez, Sherry L. Jenkins

Published: 2022-07-25 DOI: 10.1002/cpz1.487

bioinformatics

disease

drug discovery

gene sets visualization

signature analysis

web application

AI 解读

Abstract

The Library of Integrated Network-based Cellular Signatures (LINCS) was an NIH Common Fund program that aimed to expand our knowledge about human cellular responses to chemical, genetic, and microenvironment perturbations. Responses to perturbations were measured by transcriptomics, proteomics, cellular imaging, and other high content assays. The second phase of the LINCS program, which lasted 7 years, involved the engagement of six data and signature generation centers (DSGCs) and one data coordination and integration center (DCIC). The DSGCs and the DCIC developed several digital resources, including tools, databases, and workflows that aim to facilitate the use of the LINCS data and integrate this data with other publicly available data. The digital resources developed by the DSGCs and the DCIC can be used to gain new biological and pharmacological insights that can lead to the development of novel therapeutics. This protocol provides step-by-step instructions for processing the LINCS data into signatures, and utilizing the digital resources developed by the LINCS consortia for hypothesis generation and knowledge discovery. © 2022 The Authors. Current Protocols published by Wiley Periodicals LLC.

Basic Protocol 1 : Navigating L1000 tools and data in CLUE.io

Basic Protocol 2 : Computing signatures from the L1000 data with the CD method

Basic Protocol 3 : Analyzing lists of differentially expressed genes and querying them against the L1000 data with BioJupies and the Bulk RNA-seq Appyter

Basic Protocol 4 : Utilizing the L1000FWD resource for drug discovery

Basic Protocol 5 : KINOMEscan and the KINOMEscan Appyter

Basic Protocol 6 : LINCS P100 and GCP Proteomics Assays

Basic Protocol 7 : The LINCS Joint Project (LJP)

Basic Protocol 8 : The LINCS Data Portals and SigCom LINCS

Basic Protocol 9 : Creating and analyzing signatures with iLINCS

INTRODUCTION

Acronyms, Abbreviations, and Definitions

ALS	Amyotrophic lateral sclerosis
CD	Characteristic Direction
DCIC	Data Coordination and Integration Center
DSGC	Data and Signature Generation Center
DToxS	Drug Toxicity Signature Generation Center
GEO	Gene Expression Omnibus
GCP	Global chromatin profiling
GTEx	Genotype Tissue-Expression Project
HMS	Harvard Medical School
iLINCS	Integrated LINCS
iPSC	Induced pluripotent stem cell
LDP	LINCS Data Portal
Limma	Linear Model for Microarray Analysis
LINCS	Library of Integrated Network-based Cellular Signatures
MEP	Microenvironment perturbation
PCCSE	Proteomic Characterization Center for Signaling and Epigenetics
PCR	Polymerase chain reaction
SMA	Spinal muscular atrophy
SOP	Standard Operating Procedure
t-SNE	t-Distributed Stochastic Neighbor Embedding

Library of Integrated Network-based Cellular Signatures (LINCS)

Transcriptomics and other omics enable the characterization of biological processes through the identification of key molecular components and networks that govern normal physiology and disease mechanisms. The initial introduction of transcriptomics-based high-throughput drug screens has enabled the generation of gene expression profile search engines leading to new discoveries in systems pharmacology.

The establishment of the original Connectivity Map (CMAP) resource represents one of the earliest efforts to create a large reference database and search engine for human gene expression profiles (Lamb et al., 2006). Initially, CMAP contained 453 gene expression signatures, profiled with Affymetrix GeneChip microarrays, for 164 small molecules applied to four human cell lines. This resource was then expanded to contain over 7000 signatures for over 1300 compounds including most of the FDA approved drugs. Importantly, CMAP was delivered as a web-based tool to enable users to query their own signatures against the database. The first iteration of the website was widely popular, attracting thousands of users and citations from publications that utilized the resource. As a result, the NIH established the Library of Integrated Network-Based Cellular Signatures (LINCS) program to further research in the area of omics-based drug screens. For LINCS, the CMAP team at the Broad Institute set the ambitious goal of significantly expanding the original CMAP resource by utilizing a low-cost scalable transcriptomics technology called the L1000 assay (Subramanian et al., 2017).

The L1000 assay uses Luminex bead technology to measure the expression of 978 genes, from which the expression of an additional set of 11,350 genes is computationally inferred. As of 2021, the CMAP team at the Broad Institute has produced over 3 million L1000 profiles that can be converted into over 1 million unique gene expression signatures. All this data is freely available to the research community and can be accessed from several sources including the CLUE portal (Subramanian et al., 2017), SigCom LINCS (Evangelista et al., 2022), the NCBI Gene Expression Omnibus (GEO; Edgar, Domrachev, & Lash, 2002), and Google Big Query. Aside from the L1000 data, the LINCS Data Signature and Generation Centers (DSGCs) have generated a variety of other transcriptomics, proteomics, and imaging data to study the effects of microenvironment perturbations, drug combinations, neurodegenerative diseases, and genetic perturbations, with a common goal of elucidating the molecular mechanisms and pathways underlying cellular responses to each of these types of perturbations. In addition to the DSGCs, the LINCS Data Coordination and Integration Center (DCIC) has developed tools for integrating, analyzing, and visualizing LINCS data, and has led outreach efforts to support the overall goals of the program.

Here, we present nine unique protocols to guide users, with different ways to access the LINCS data, compute signatures, and use a variety of bioinformatics tools to leverage LINCS data for signature analysis and visualization. Basic Protocol 1 describes CLUE.io, a platform that provides access to the L1000, P100, and GCP data, and tools for exploring these data. Basic Protocol 2 explains how users can compute gene expression signatures using the Characteristic Direction method (Clark et al., 2014). Basic Protocol 3 guides users on how to leverage Biojupies (Torre, Lachmann, & Ma'ayan, 2018) and the Bulk RNA-seq Appyter (Clarke et al., 2021) for generating lists of differentially expressed genes for exploratory data analysis that includes L1000 queries. Basic Protocol 4 presents the tool L1000FWD, a fireworks visualization of over 17,000 selected L1000 signatures. L1000FWD provides interactive exploration of drug-induced L1000 signatures (Wang, Lachmann, Keenan, & Ma'ayan, 2018). Basic Protocol 5 introduces KINOMEscan, a kinase profiling assay, and the KINOMEscan Appyter (Clarke et al., 2021). This Appyter facilitates the visualization of KINOMEscan data and performs kinase enrichment analysis. In Basic Protocol 6, we introduce the P100 and GCP LINCS proteomics assays. Basic Protocol 7 presents tools to explore the LINCS Joint Project (LJP) data, a collaborative project that coupled transcriptomics data with cell viability assays to study drug responses in cancer cell lines (Niepel et al., 2019). Basic Protocol 8 gives an in-depth explanation of the LINCS Data Portals, which are centralized hubs for viewing, downloading, and analyzing LINCS data (Koleti et al., 2018; Stathias et al., 2019), and SigCom LINCS (Evangelista et al., 2022), a LINCS data search engine that was designed based on the findable, interoperable, accessible, and reusable (FAIR) guiding principles (Wilkinson et al., 2016). Finally, Basic Protocol 9 describes how to leverage the iLINCS web application (Pilarczyk et al., 2020) for generating and analyzing signatures from LINCS transcriptomics and proteomics data.

Basic Protocol 1: NAVIGATING L1000 TOOLS AND DATA IN CLUE.io

CLUE.io is a cloud-based platform developed by the LINCS Center for Transcriptomics and the Proteomic Characterization Center for Signaling and Epigenetics (PCCSE) DSGCs to provide access to raw and processed data generated from the L1000, P100, and GCP assays.

Necessary Resources

Hardware

Desktop or a laptop computer, or a mobile device, with a fast Internet connection

Software

An up-to-date web browser such as Google Chrome (https://www.google.com/chrome/), Mozilla Firefox (https://www.mozilla.org/en-US/firefox/), Apple Safari (https://www.apple.com/safari/), or Microsoft Edge (https://www.microsoft.com/en-us/edge)

Accessing the L1000 data from clue.io

1.Navigate to the CLUE.io website at https://clue.io/.

Creating an account on the CLUE.io website

2.To access the resources on the CLUE.io site, create a free account by clicking on the Log in button at the top right of the homepage (Fig. 1), and then click the “Create an account” hyperlink. Enter your name, e-mail address, desired password, and institution, and check the box specifying that you are affiliated with a non-profit organization. Specify your research role and academic training, and then click the Create an Account button.

Logging into the CLUE.io website

3.Once you have established an account, log into the site by clicking on the Log in button at the top right of the homepage (Fig. 1), and specify your e-mail address and password. Then, click the Log in button to log into the system.

Downloading the L1000 dataset

4.To access the complete L1000 data from its most recent release, select the Data Library item from the Tools menu at the top of the CLUE.io website (Fig. 2, blue shading), and then click on the “Expanded CMap LINCS Resource 2020 (CMap2020)” option in the results list (Fig. 3, blue shading) to view the components of this dataset (Fig. 4).

Data Library choice from the tools drop-down menu that redirects to a page for LINCS L1000 data downloads.

A table of expandable LINCS projects and their associated datasets.

Page of supporting file download links for the CMAP LINCS 2020 dataset.

5.Files must be downloaded separately. To download each file, click on the name of the file, which serves as a download link. File sizes and dimensions are available under the “File size” and “Data matrix” columns, respectively.

Exploring the L1000 data via the CLUE.io command app

6.The CLUE Command App (Fig. 5), accessed via the Tools menu Command option, permits querying by keywords to provide detailed information about compounds, genes, classes, connectivities, and other metadata about the L1000 data and the P100 and GCP proteomics data.

Use the/assayoption to view and download the assays in which small-molecule perturbagens have been profiled, as well as the complete set of all small-molecule perturbagens that have been profiled with a selected assay.
Use the/gene-spaceoption to return information about whether genes of interest are measured or inferred by L1000.
Use the/moaoption to query a mechanism of action (MoA) and return all matching small molecules.
Using the/targetoption permits viewing and downloading target genes for queried small-molecule perturbagens, as well as all small-molecule perturbagens that match the queried terms.
Use the/connoption to query connectivity data for a compound and view top connections in the CMap data as well as internal connectivities in cell lines.
Use the/gexoption to view the baseline gene expression for Cancer Cell Line Encyclopedia (CCLE) cell lines (Barretina et al.,2012). View cell lines individually or in groups based on selected metadata fields.
Use the/sigoption to query L1000 signatures in Level 5 moderatedz-score format for specific perturbagens. The results are returned as a heatmap and can be downloaded as a GCT file.

The CLUE Command application that allows querying for detailed information about compounds, genes, classes, connectivities, and other metadata in the L1000 data.

Basic Protocol 2: COMPUTING SIGNATURES FROM THE L1000 DATA WITH THE CD METHOD

The L1000 Level 3 dataset can be downloaded and used to compute signatures for specific drugs and small molecules. This protocol describes the process of computing signatures from the Level 3 L1000 dataset using the Characteristic Direction (CD) method (Clark et al., 2014).

Necessary Resources

Hardware

Desktop or a laptop computer, or a mobile device, with a fast Internet connection

Software

An up-to-date web browser such as Google Chrome (https://www.google.com/chrome/), Mozilla Firefox (https://www.mozilla.org/en-US/firefox/), Apple Safari (https://www.apple.com/safari/), or Microsoft Edge (https://www.microsoft.com/en-us/edge)
Python 3.8+ (https://www.python.org/downloads/)
A code editor, such as Microsoft Visual Studio Code (https://code.visualstudio.com/)
The maayanlab-bioinformatics Python package (https://github.com/MaayanLab/maayanlab-bioinformatics)

Importing L1000 data

1.Download the following required files from the LINCS data releases app (https://clue.io/releases/data-dashboard). Save all files in the same directory as the processing script, if applicable.

L1000 metadata for Level 5 signatures (siginfo_beta.txt)
L1000 metadata for Level 3 signatures (instinfo_beta.txt)
L1000 gene metadata (geneinfo_beta.txt)
L1000 Level 3 data (level3_beta_all_n3026460 × 12328.gctx).

The GCTX file format is an HDF5-based format for storing dense matrices; it is widely used for storing the CMAP data (Enache et al., 2019). To better understand the different levels of L1000 data, please see the Critical Parameters section.

2.Define the signature of interest by cell line, perturbagen name, perturbation time, and perturbation dose. For example, the cell line A549, the drug dexamethasone, the time point 24 hr, and the concentration 10 µM would constitute a single signature.

Pre-processing data

3.Extract and store in separate tables the row names, column names, and data matrix from the Level 3 GCTX file. The row names are probe IDs that each correspond to a gene, while the column names are instance IDs corresponding to individual replicate instances of a perturbation. Each value in the 2-dimensional data matrix contains the quantile normalized expression level for a given gene in the given instance.

4.Store the batch IDs for all signatures in the signature metadata file. The batch ID is simply the text that comes before the colon in each signature ID. For example, given the signature ID “AML001_CD34_24H:BRD-A03772856:0.37037”, the corresponding batch ID would be “AML001_CD34_24H”.

Note

For more information on LINCS L1000 data naming conventions, please see the Critical Parameters section.

5.Store the batch IDs for all perturbation instances in the instance metadata file by extracting the first three underscore-delimited terms from each sample_id. For example, given the sample ID “ERG013_VCAP_72H_X3_B11”, the batch ID would be “ERG013_VCAP_72H”.

Computing signatures with the Characteristic Direction method

6.Identify the signature of interest in the dataset by matching the cell line from step 2 to the cell_iname column, the perturbagen name to pert_iname, the timepoint to pert_itime, and the dose to pert_idose.

Note

If multiple signatures from different batches match, perform the rest of this protocol for each signature individually.

7.Use the distil_ids column to identify the instance IDs corresponding to the signature of interest, then slice those instances from the Level 3 GCTX file, which will be the “treatments”. Store the treatment matrix.

8.Filter the instance metadata by the batch ID corresponding to the signature of interest, then remove the treatment instances. The remaining sample IDs consist of all other instances in the batch excluding those corresponding to the treatment, and these samples will serve as “controls”. Slice these instances from the Level 3 GCTX file, and store as the control matrix.

9.Run the Characteristic Direction (CD) method on the treatment data and the control data. The method is implemented in several languages, and the code can be accessed from https://maayanlab.net/CD/; however, the simplest way is to use the maayanlab-bioinformatics Python package (https://github.com/MaayanLab/maayanlab-bioinformatics) that includes a Characteristic Direction function. The result is a vector in which each entry represents a gene and its associated CD coefficient (Clark et al., 2014).

Note

Note that the order of the values in each instance is the same, and the order is preserved in the signature vector as well.

Obtaining differentially expressed genes from the signature

10.Compute the 2-tailed z- scores and p- values for each coefficient in the CD results vector. Map the row IDs stored in step 3 to each coefficient and its p- value using the gene metadata file; the gene_id column provides the index of each gene in the vectors, and the gene_symbol column gives the common gene symbol corresponding to that index.

11.Identify all genes that correspond to a CD coefficient GREATER than 0 and a p- value < the chosen alpha. These are the up-regulated genes in the signature.

12.Identify all probe IDs that correspond to a CD coefficient LESS than 0 and a p- value < the chosen alpha. These are the down-regulated genes in the signature.

Basic Protocol 3: ANALYZING LISTS OF DIFFERENTIALLY EXPRESSED GENES AND QUERYING THEM AGAINST THE L1000 DATA WITH BioJupies and the BULK RNA-seq APPYTER

BioJupies (Torre et al., 2018) and the Bulk RNA-seq Appyter (Clarke et al., 2021) are two web-based platforms developed by the LINCS DCIC to produce customized and interactive Jupyter notebooks for RNA-seq analysis. The BioJupies platform (Torre et al., 2018) generates comprehensive Jupyter Notebook reports from user-inputted raw or processed RNA-seq data, including RNA-seq data fetched from GEO (Edgar et al., 2002) and GTEx (GTEx Consortium, 2020). Each generated notebook report can be downloaded and shared. Each automatically generated notebook is stored persistently in the cloud and is made accessible via a unique URL. BioJupies contains several analysis tools that fall under four categories: exploratory data analysis, differential expression analysis, enrichment analysis, and small molecule query. The Bulk RNA-seq Appyter is also an online web-based application that provides an interface for users to upload processed RNA-seq count data files. Then, the Appyter automatically generates Jupyter Notebook–based reports that contain analysis and visualization of the uploaded data with principal component analysis (PCA), differential gene expression analysis, and L1000 small molecule search. The Appyter reports are similar to the analyses provided with BioJupies. Both have user-friendly interfaces for uploading and submitting data, selecting computational tools, and customizing tool parameters. This basic protocol demonstrates how to use these two platforms to analyze data from published RNA-seq gene expression studies, including querying the computed signature against the L1000 data for prioritizing small molecules that can reverse or mimic the expression of the input gene expression signatures.

Necessary Resources

Hardware

Desktop or a laptop computer, or a mobile device, with a fast Internet connection

Software

An up-to-date web browser such as Google Chrome (https://www.google.com/chrome/), Mozilla Firefox (https://www.mozilla.org/en-US/firefox/), Apple Safari (https://www.apple.com/safari/), or Microsoft Edge (https://www.microsoft.com/en-us/edge)

Submitting data to BioJupies

1a. Visit https://maayanlab.cloud/biojupies/ from your web browser (Fig. 6).

2a. Click the “Get started” button to submit data. User-inputted raw or processed data, as well as data from GEO (Edgar et al., 2002) and GTEx (GTEx Consortium, 2020), can be submitted (Fig. 7).

To submit to BioJupies data published in GEO, select “Published Data”, then select the GEO option. Choose from 9145 processed datasets by searching by keyword, filtering by organism, or filtering by number of samples. Select one of the search results by clicking “Analyze” (Fig. 7A). As an example, we use the GEO Series GSE88741 as the input data.
To submit data from the GTEx portal for analysis with BioJupies: select “Published Data”, then select the GTEx option. Select two groups of samples by filtering each table and checking the samples to include, then click “Continue” (Fig. 7B).
To submit your own gene expression table (TXT, CSV, TSV, XLS, or XLSX file formats): select “Your Data”, then select “Gene Expression Table”. Either drag and drop your gene expression data file or click to browse and upload. Select “Continue” when finished. Label the sample groups manually or upload a metadata file (see example for proper formatting of your metadata file), then press “Continue” (Fig. 7C).
To submit your own raw sequencing data: select “Your Data”, then select “Raw Sequencing Data”. Upload your data by clicking “Choose Files”, selecting your files, and clicking “Upload Files”. Note that uploaded files should be saved as fastq.gz format, must be less than 5 GB, and may be deleted after 1 week (Fig. 7D).

It is recommended to create an account when uploading FASTQ files. The alignment results will be saved in your account, so you do not have to repeat the alignment process, which can take several hours.

Data submission page with choices of (A) submitting published data from GEO, (B) submitting published data from GTEx, (C) submitting a gene count matrix, or (D) submitting a FASTQ file.

Querying signatures against the L1000 data with BioJupies

3a. The analysis page of BioJupies enables the addition or removal of data analysis tools and visualizations from the generated Jupyter Notebook report (Fig. 8). Analysis tools fall under four categories: exploratory data analysis, differential expression analysis, enrichment analysis, and small molecule queries. These tools can also be selected under their respective headers by clicking “Add”.

Exploratory data analysis tool options include PCA , a linear dimensionality reduction technique to visualize sample similarity; Clustergrammer , an interactive hierarchical clustering heatmap visualization (Fernandez et al., 2017); and Library Size Analysis , which is analysis of the read-count distribution for samples in the dataset (Fig. 8A). In this example, we select all three options by clicking “Add”.
Differential expression analysis tool options include Differential Expression Table (differential expression analysis between two groups of samples), Volcano Plot (plots logFC and logP values from differential expression analysis), and MA Plot (plots logFC and average expression values from differential expression analysis; Fig. 8B).
Enrichment analysis tool options include Enrichr , which produces links to enrichment analysis results of differentially expressed genes (Kuleshov et al., 2016); Gene Ontology Enrichment Analysis , which identifies Gene Ontology terms enriched in the differentially expressed genes based on Enrichr analysis; Pathway Enrichment Analysis , which identifies biological pathways enriched in the differentially expressed genes; Transcription Factor Enrichment Analysis , which identifies transcription factors whose targets are enriched in the differentially expressed genes; Kinase Enrichment Analysis , which identifies protein kinases whose substrates are enriched in the differentially expressed genes; and miRNA Enrichment Analysis , which identifies micro-RNAs whose targets are enriched in the differentially expressed genes (Fig. 8C).
Small molecule query options include L1000CDS2 , which identifies small molecules that mimic or reverse the provided signature (Duan et al., 2016), and L1000FWD (Wang et al., 2018), which projects the provided signature onto a 2-D fireworks visualization that projects the L1000 signature database (Fig. 8D). In this example, we select “L1000FWD Query” by clicking “Add”.

Analysis page for selecting data analysis and visualization tools to be executed in the Jupyter Notebook, including (A) exploratory data analysis, (B) differential expression analysis, (C) enrichment analysis, and (D) small molecule querying sections.

4a. On the “Which samples would you like to compare?” page, enter the names for the two groups to compare if desired, then manually label each sample with its group name. Alternatively, select “Predict Groups” to automatically classify samples based on their names. In the example, we have selected “Predict Groups”. Click “Continue” once you have selected samples into the two groups that you wish to compare (Fig. 9).

Sample comparison page that allows for selection of control and perturbation samples for comparison.

5a. On the “Review and Submit” page, customize your input parameters by selecting “Modify Parameters” and then make your desired changes. In the example, we have set the Clustergrammer settings as follows: Top Genes = 2500, Normalization = logCPM, and Z- score = True (Fig. 10). Depending on the features selected in step 3, each section will have various parameters. Click “Generate Notebook” when done.

Note

A loading screen will appear after clicking “Generate Notebook” with an estimated wait time, which is typically less than 2 min but may vary depending on file size and the number of analysis tools you selected.

Page for modifying options of the selected data analysis and visualization options.

6a. Once the notebook has been generated, the “Results” page will appear. The notebook can be opened by clicking the notebook name or the “Open Notebook” buttons. The notebook can also be shared using the “Tweet”, “Email”, and “Copy Link” buttons (Fig. 11).

Results page with options for opening the generated notebook and sharing it on social media.

7a. With the notebook opened, a table of contents will be displayed. Select the link to any section to view each respective analysis (Fig. 12). The notebook can be downloaded by clicking the download icon in the upper right corner and clicking “Save Link As…”; this will prompt a pop-up window to save the notebook as an IPYNB file.

Table of contents for the Jupyter notebook analysis, each of which can be clicked to navigate to the respective section.

Submitting data to the Bulk RNA-seq Appyter

1b. Visit https://appyters.maayanlab.cloud/#/Bulk_RNA_seq in your web browser (Fig. 13).

2b. Click the “Start Appyter” button. A page for selecting and customizing data and tools will be displayed.

3b. Upload expression data and metadata under the “Load Your Data” tab (Fig. 14).

Bulk RNA-seq Analysis Appyter input form with section for uploading gene counts and metadata files displayed.

4b. Select the normalization methods under the “Select Normalization Methods” tab. Options include filtering genes , low expression threshold , logCPM normalization , log normalization , Z normalization , and quantile normalization (Fig. 15). Use the default settings if you are unsure about these options.

The “Normalization Methods” section of the input form with several options for normalizing the gene count matrix.

5b. Select visualization parameters under the “Select Visualization Parameters” tab. Options include interactive plots, visualization methods (PCA, UMAP, t-SNE), genes for dimensionality reduction, gene list for Clustergrammer, and genes for Clustergrammer (Fig. 16).

Input form section for selecting options to visualize differential gene expression computed by the Appyter notebook.

6b. Select differentially expressed gene analysis parameters under the “Select Differentially Expressed Gene Analysis Parameters” tab. Options include differential expression analysis method (limma, characteristic direction, edgeR, DESeq2), differential expression analysis plotting method (volcano plot, MA plot), p-value threshold, logFC threshold, maximum genes for Enrichr, Enrichr libraries (Gene Ontology, Pathway, Kinase, Transcription Factor, and miRNA), top-ranked gene sets, small molecule analysis method (L1000FWD, L1000CDS2), genes for L1000CDS2 or L1000FWD, and top-ranked drugs from L10000CDS2 or L1000FWD (Fig. 17).

Input form section for selecting tools that compute and analyze gene expression signatures including gene set enrichment analysis and mimicker/reverser identification.

Querying the signatures created from the uploaded data against the L1000 data within the Bulk RNA-seq Appyter

7b. Use the default options and click “Submit” at the bottom of the page to generate your notebook. Once the notebook has loaded, note that there are options to download the notebook, toggle code, and run the notebook locally at the top of the page. The notebook can also be easily navigated using the Table of Contents on the left side of the page (Fig. 18). Use the table of contents to navigate through each analysis section.

Executed Appyter notebook header with the table of contents and options for downloading the notebook, toggling code, and running the notebook locally highlighted.

8b. Navigate to the “Visualize Samples” section to view a Principal Component Analysis (PCA) plot made from the 2500 genes with the highest variance in each of the samples, where each of the sample groups are indicated by color (Fig. 19).

PCA plot of genes with highest variance across samples. Each sample group is visualized using a different color.

9b. Navigate to the “Clustergrammer” (Fernandez et al., 2017) section to view a heatmap visualization that displays gene expression for each of the genes across all samples, where blue and red indicate decreases or increases in expression, respectively (Fig. 20).

Clustergrammer heatmap visualization of gene expression for each gene across all samples.

10b. Navigate to the “Library Size Analysis” section to view a histogram that displays the total number of reads matched for each sample, which enables the identification of outlier samples and assesses the overall quality of the RNA-seq data (Fig. 21).

Library size analysis histogram for inspecting outlying samples based on total amount of mapped reads.

11b. Scroll down to the “Differential Gene Expression” section to view a volcano plot of differential gene expression between the two groups of samples, quantified by log2 fold change and statistical significance of each gene. Blue points correspond to significantly down-regulated genes, whereas red points correspond to significantly up-regulated genes (Fig. 22).

Volcano plot of significantly differentially expressed genes between sample groups, where red points represent up-regulated genes and blue points represent down-regulated genes.

12b. Navigate to the “Enrichment Analysis with Enrichr” section to view bar charts of significantly enriched up and down-regulated terms from the Gene Ontology (The Gene Ontology Consortium, 2019), KEGG (Kanehisa & Goto, 2000), Reactome (Fabregat et al., 2018), and Wikipathways (Kutmon et al., 2016) gene set libraries (Fig. 23).

Gene Ontology and Pathway enrichment analysis bar plots displaying the top enriched down-regulated and up-regulated terms based on the submitted differentially expressed gene set computed by the Appyter notebook.

13b. Scroll to the “L1000FWD Query” section to view an interactive display of ∼17,000 L1000 drug-induced gene expression signatures. A downloadable list of mimicking and reversing signatures is provided in the report and available by pressing the blue button on the display. The points on the interactive fireworks display can be shaped and colored by p- value, dose, or time point, and several other options such as MOA, clinical development stage, and automated clustering assignment (Fig. 24).

L1000FWD fireworks visualization of signatures mimicking and reversing the differential gene expression signature generated from the input data.

14b. To save the notebook generated by the Bulk RNA-seq Appyter, click on the blue “Download Notebook” button at the top of the page. The notebook should be downloaded as a Jupyter Notebook (.ipynb) file.

One way to open an .ipynb file is a console window (e.g., Terminal on MacOS, or Command Prompt on Windows). You may need to install the Jupyter Notebook and iPython packages first. In the console window, navigate to the directory to which the downloaded file was saved (cd //…) and type jupyter notebook to open the notebook in your default web browser.

Basic Protocol 4: UTILIZING THE L1000FWD RESOURCE FOR DRUG DISCOVERY

The L1000FWD platform (Wang et al., 2018) provides visualization of drug-induced transcriptomics signatures. The fireworks display is an interactive scatter plot visualizing over 17,000 drug- and small-molecule-induced gene expression signatures as points in two-dimensional space. The L1000FWD map is useful for identifying mechanisms of action (MOA) for novel small molecules using unsupervised clustering, as well as for exploring drugs that may reverse or mimic an input signature of up and down genes. L1000FWD enables coloring of signatures by different attributes including cell type, concentration, and time point, as well as drug attributes including MOA and clinical phase. Each point on the L1000FWD interactive map is linked to a signature landing page, which provides multifaceted knowledge about the signature and the drug from various sources. That information includes most frequent diagnoses, co-prescribed drugs, and patient age distribution of prescriptions extracted from the Mount Sinai electronic medical records (EMR) system.

Necessary Resources

Hardware

Desktop or a laptop computer, or a mobile device, with a fast Internet connection

Software

An up-to-date web browser such as Google Chrome (https://www.google.com/chrome/), Mozilla Firefox (https://www.mozilla.org/en-US/firefox/), Apple Safari (https://www.apple.com/safari/), or Microsoft Edge (https://www.microsoft.com/en-us/edge)
Text editor or development environment of choice, such as Visual Studio (https://visualstudio.microsoft.com/vs/)
Most updated version of Python (https://www.python.org/downloads/) and Python requests library (https://requests.readthedocs.io/en/master/user/install/)

1.Navigate to the L1000FWD homepage (https://maayanlab.cloud/L1000FWD/) (Fig. 25). The homepage includes summary statistics of small molecules profiled in various cell lines, a search bar for querying terms of interest, and a launch button for generating the fireworks visualization.

2.On the homepage, type a drug name or cell line query term or phrase of interest in the search field, for example, the cell line MCF7.If a portion of the query string matches an entry in the L1000FWD database, a list of matches will appear as a drop-down (Fig. 26). Clicking on the left element in the drop-down generates a fireworks display filtered by signatures profiled in the MCF7 cell line (Fig. 27), whereas clicking on any of the signatures on the right side of the drop-down menu redirects to a page with identifying metadata for a specific small-molecule signature profiled in the MCF7 cell line (Fig. 28).

L1000FWD homepage query box populated with the cell line term “MCF7” and the available options for visualizing signatures profiled in the MCF7 cell line (left) and individual signature pages that were profiled in the MCF7 cell line (right).

Fireworks visualization of L1000 drug-induced gene expression signatures profiled in the MCF7 cell line.

Signature report page for a trichostatin-a signature profiled in the MCF7 cell line.

Exploring the L1000FWD visualization

3.Click “Launch” on the homepage of L1000FWD to generate the fireworks display with all cell lines, where each of the points represent a drug-induced gene expression signature (Fig. 29). Hovering over a signature displays more information about the signature including the drug name, cell line, concentration, time point, and ID of the signature.

Fireworks visualization of L1000 drug-induced gene expression signatures profiled in all available cell lines.

4.There exist several options for altering the visualization, reflected by changes in the shape or color of each signature point. The “Shape by” drop-down menu allows for filtering each signature by p- value, dose, and time points, whereas the “Color by” drop-down menu includes several options for coloring the signatures by cell line, mechanism of action, among several other attributes (Fig. 30).

Fireworks visualization where the signature points are shaped by p-value and colored by cell line.

5.The “Search compounds” autocomplete textbox enables the input of a small molecule whose signatures will be highlighted in the visualization (Fig. 31).

Fireworks visualization with dexamethasone signatures highlighted.

6.In the “Signature Similarity Search” section to the right of the plot, enter a list of up-regulated genes and a list of down-regulated genes, and click the Submit button. The gene lists can be pre-populated with example data by clicking the “Example” button. Regions in the gene expression space that mimic or reverse the submitted up/down genes will be highlighted in red and blue, respectively (Fig. 32). Alternatively, a signature including up/down genes from CREEDS (Wang et al., 2016) can be submitted by inputting a query term in the autocomplete field.

Fireworks visualization with drug-induced signatures that reverse or mimic the input signature highlighted.

7.By default, signatures profiled in all cell lines are included. In the navigation bar at the top of the page, click the “Cells” drop-down menu and select a cell line of interest to filter the resulting visualizations by signatures that were only profiled in the selected cell line (Fig. 33).

Drop-down menu of available cell lines to filter the fireworks visualization.

Viewing collections of signature reports for an individual drug

8.On the homepage, click the “Drugs” button on the top menu. You will be navigated to a table listing 20,000+ drugs (Fig. 34); this table can be browsed as well as manually searched using the “Search drugs” panel. For each drug, a hyperlinked landing page is provided to list properties of the drug including name, LINCS perturbagen ID, MOAs, and target(s), if known, and chemical and structural properties. For drugs that have associated L1000 signatures, a table with title “Gene signatures” is provided on the drug's landing page; for each entry, the table lists the Signature ID, p- value, cell type, and dose.

Search page for querying drug names to retrieve drug pages with identifying metadata for the drug and its signatures.

Generating signature reports

9.On the homepage, click on the “Signatures” button on the top menu. You will be navigated to the Generate Signature Reports page (Fig. 35), which facilitates selecting a subset of drug-induced gene expression signatures to visualize. Click “Example 1” to populate the fields with compounds, cell lines, and time points to filter a subset of signatures. Click on the “Submit” button to submit the form for processing. The information entered will be posted to the server, and an interactive visualization of the subset of signatures will be displayed (Fig. 36).

Signature report page for inputting small molecules, cell lines, and time points of interest.

Interactive visualization of the signatures that match the search criteria input in the signature report page.

Downloading L1000FWD data

10.On the homepage, click on the “Download” button on the top menu; you will be navigated to the Downloads page (Fig. 37), which includes two sections of content: Drug-Induced Gene Expression Signatures and Adjacency Matrices and Graphs.

Download Content from the Drug-Induced Gene Expression Signatures and Adjacency Matrices Section: this table lists the filenames, and associated file descriptions and sizes. The files in this table have various formats, including GCTX, GMT, JSON, and CSV. For any of the nine entries listed in this table, click on the entry's hyperlink in the left column of the table to download each file.
Download Content from Graphs Section: this section provides the datasets associated with the All Cells L1000FWD plot and the 40 L1000 and t-SNE plots for individual cell lines. The associated cell line and number of signatures are listed in the table. Click on the hyperlink in the left column for any entry to download its dataset.

Using the L1000FWD API

11.Open a new or existing Python code file. Import the “JSON” and “requests” libraries at the top of the file as follows.

import json
import requests

12.Call the requests.get method to send a GET request to the URL. The query_string variable contains the string that is sent to the L1000FWD_URL/synonyms endpoint. If the endpoint is available, then the response is saved to a JSON file.

L1000FWD_URL = 'ht tps://maayanlab.cloud/L1000FWD/'
query_string = 'dex'
response = requests.get(L1000FWD_URL + 'synonyms/' + query_string)
if response.status_code == 200:
pprint(response.json())
json.dump(response.json(), open('api1_result.json', 'wb'), indent=4)

13.View the response as a JSON object containing all drug objects that match the query string.

[
- {
  - "pert_id": "BRD-K07265709",
  - "Name": "DEXRAZOXANE"
- },
- {
  - "pert_id": "BRD-A93424738",
  - "Name": "DEXAMETHASONE-ACETATE"
- },
- {
  - "pert_id": "BRD-A10188456",
  - "Name": "DEXAMETHASONE"
- },
…
]

For more information on using the various L1000FWD API endpoints, please refer to the API documentation (https://maayanlab.cloud/L1000FWD/api_page).

Basic Protocol 5: KINOMEscan AND THE KINOMEscan APPYTER

KINOMEscan is a commercial kinase profiling assay provided by DiscoveRx. The KINOMEscan assay is based on competitive binding, in which each drug or compound of interest is run against a panel of approximately 440 purified kinases. Results are reported as “percent of control” (% control), which represent the amount of kinase-ligand binding observed when a test compound is present, compared to the control compound DMSO. As part of the LINCS program, the Harvard Medical School (HMS) LINCS DSGC profiled ∼180 different drugs and small molecules with KINOMEscan (Fabian et al., 2005). The KINOMEscan Data Visualization Appyter (https://appyters.maayanlab.cloud/#/KINOMEscan) provides tables and bar chart visualizations of KINOMEscan data for kinase and small molecule queries. The Appyter can also identify drug targets and perform kinase enrichment analysis based on an input protein set.

Necessary Resources

Hardware

Desktop or a laptop computer, or a mobile device, with a fast Internet connection

Software

An up-to-date web browser such as Google Chrome (https://www.google.com/chrome/), Mozilla Firefox (https://www.mozilla.org/en-US/firefox/), Apple Safari (https://www.apple.com/safari/), or Microsoft Edge (https://www.microsoft.com/en-us/edge)

Accessing KINOMEscan data

1.Navigate to the KINOMEscan section of the HMS LINCS Database (https://lincs.hms.harvard.edu/kinomescan/). The table on this page displays information on all small molecules profiled by the HMS LINCS Center, including each molecule's primary name, alternative names, LINCS Small Molecule ID, HMS Small Molecule ID, and corresponding HMS LINCS Dataset ID (Fig. 38). To download the entire table as an Excel spreadsheet, click the “available for download as a spreadsheet (.xlsx)” link in the explanatory paragraph. For multiple downloads, right click the link, select “Save link as…” and save the spreadsheet to the desired local folder.

2.Click on any ID in the “HMS LINCS Dataset ID” column to view a specific dataset. By default, the “Detail” tab is shown on the new page, which provides project information, assay metadata, and other information relevant to the specific profiling assay (Fig. 39).

Detailed metadata for the (s)-CR8 KINOMEscan dataset on the HMS LINCS Database. The dataset ID, 20342, is visible at the top of the table.

3.Click the “Small Molecules Studied” tab (Fig. 40) to view metadata on the small molecule profiled in this dataset, including the structural image and PubChem ID of the molecule.

Metadata on all small molecules studied in the (s)-CR8 KINOMEscan dataset (ID:20342) on the HMS LINCS Database. Only the small molecule (s)-CR8 was studied in this dataset.

4.Click the “Proteins Studied” tab (Fig. 41) to view metadata on all panel kinases used in this dataset, including identifiers, names, domain, mutations, and phosphorylation states.

Metadata on all panel kinases profiled in the (s)-CR8 KINOMEscan dataset (ID:20342) on the HMS LINCS Database.

5.Click the “Data Columns” tab (Fig. 42) to view metadata and descriptions for each of the columns in the results table for the given dataset.

Metadata for each column of the data table provided in the “Results” tab of the (s)-CR8 KINOMEscan dataset (ID:20342) on the HMS LINCS Database.

6.Click the “Results” tab (Fig. 43) to view the results of the assay. The % control and equilibrium dissociation constant (Kd) quantify binding of the corresponding protein kinase in each row to a ligand when the tested small molecule was present.

The actual results of the KINOMEscan assay for (s)-CR8 provided as a data table in the (s)-CR8 KINOMEscan dataset (ID:20342) on the HMS LINCS Database. Each row represents a single kinase.

7.Use the download links in the top right corner of any tab to download the full table on that tab as either an Excel (.xlsx) or CSV file.

Querying a kinase with the KINOMEscan Data Visualization Appyter

8.Navigate to the KINOMEscan Data Visualization Appyter at https://appyters.maayanlab.cloud/#/KINOMEscan and click the “Start Appyter” button to view the input form (Fig. 44).

Input form of the KINOMEscan Data Visualization Appyter. Default options are already filled in.

9.Under the section heading “Input a Small Molecule and/or Kinase”, enter a kinase of interest into the “Kinase” search box. Scroll to the bottom of the input form and click the blue “Submit” button.

10.The Appyter will automatically begin execution. When execution has completed, as indicated by the “Success” box at the top of the page, scroll down to the section titled “Generate table and bar chart of small molecules for kinase input from KINOMEscan data” to view the results (Fig. 45). The tables show the top-ranked small molecules that bind the input kinase, based on both % control and Kd values; the bar charts show the distribution of % control and Kd values among all small molecules which bind the input kinase.

Query results for the kinase ABL2 in the KINOMEscan Data Visualization Appyter. The tables provide the top-ranked small molecules that bind with ABL2 based on either % control or Kd value, while the bar charts show the distribution of % control and Kd values for all small molecules that bind ABL2.

Querying a small molecule in the KINOMEscan Data Visualization Appyter

11.Navigate to the KINOMEscan Data Visualization Appyter at https://appyters.maayanlab.cloud/#/KINOMEscan and click the “Start Appyter” button to view the input form (see Fig. 44).

12.Under the section heading “Input a Small Molecule and/or Kinase”, enter a small molecule of interest into the “Small Molecule” search box. Scroll to the bottom of the input form and click the blue “Submit” button.

13.The Appyter will automatically begin execution. When execution has completed, as indicated by the “Success” box at the top of the page, scroll down to the section titled “Generate table and barchart of kinases for small molecule input from KINOMEscan data, with either equilibrium dissociation constant Kd or % Control” to view the results (Fig. 46). The table shows the top-ranked kinases bound by the input small molecule, based on either % control or Kd values; the bar chart shows the distribution of % control or Kd values among all kinases bound by the input small molecule.

Query results for the small molecule AC220 in the KINOMEscan Data Visualization Appyter. The table displays the top-ranked kinases bound by AC220 based on Kd, while the bar chart shows the distribution of Kd values for all kinases bound by AC220.

Querying a kinase list

14.Navigate to the KINOMEscan Data Visualization Appyter at https://appyters.maayanlab.cloud/#/KINOMEscan and click the “Start Appyter” button to view the input form (see Fig. 44).

15.Scroll to the section titled “Upload or Enter a List of Kinases”. You may either upload a text file (.txt) using the “Upload kinase list” box or type a list of kinases into the “Input kinase list” box. Each row of either the file upload or text input should have only one kinase. Scroll to the bottom of the input form and click the blue “Submit” button.

16.The Appyter will automatically begin execution. When execution has completed, as indicated by the “Success” box at the top of the page, scroll down to the section titled “Generate ranked lists of drugs for inputted or uploaded kinases” to view the results (Fig. 47), which show the top 5 drugs that bind to the kinases in the input list by average % control and Kd, as well as by net average % control and Kd. Net values are calculated by subtracting the average % control or Kd value across all kinases from the average % control or Kd for only the input kinases.

The results of querying the example kinase list provided on the KINOMEscan Data Visualization Appyter. The tables show the top five drugs that bind kinases in the input list by average % control and Kd.

Querying a gene list

17.Navigate to the KINOMEscan Data Visualization Appyter at https://appyters.maayanlab.cloud/#/KINOMEscan and click the “Start Appyter” button to view the input form (see Fig. 44).

18.Scroll to the section titled “Upload or Enter a Gene/Protein List”. You may either upload a text file (.txt) using the “Upload gene/protein list” box, or type in a list of genes to be queried into the “Input gene/protein list” box. Each row of either the file upload or text input should have only one kinase.

19.In the “Number of top kinases to consider” input box, enter in how many top kinases you would like to see in the results. The default value is 10.Then, scroll to the bottom of the input form and click the blue “Submit” button.

20.The Appyter will automatically begin execution. When execution has completed, as indicated by the “Success” box at the top of the page, scroll down to the section titled “Perform Kinase Enrichment Analysis on the inputted or uploaded genes” (Fig. 48). The tables show the results of performing kinase enrichment analysis on the input gene list. The top five drugs that bind to kinases coded by the input genes are displayed, based on average % control and Kd values, as well as based on net average % control and Kd. Net values are calculated by subtracting the average % control or Kd value across all kinases from the average % control or Kd for only the input kinases.

Query results for the example gene list provided on the KINOMEscan Data Visualization Appyter. The tables show the top five drugs, based on average % control and Kd, that bind kinases coded by the input genes.

Basic Protocol 6: LINCS PROTEOMICS: THE P100 AND GCP ASSAYS

The LINCS Proteomic Characterization Center for Signaling and Epigenetics (PCCSE) examined the effects of small molecule and genetic perturbations on the proteome and epigenome (Litichevskiy et al., 2018). Changes in phospho-signaling and chromatin states were measured using two liquid chromatography mass spectrometry (LCMS) assays. The P100 assay measures the levels of 96 widely studied cell signaling peptides and phosphopeptides, which serve as a reduced representation of the signalome. The global chromatin profiling (GCP) assay measures post-translational histone modifications in bulk chromatin, which enables the generation of epigenetic signatures corresponding to various perturbations. These LINCS proteomics datasets are hosted on the PanoramaWeb (Sharma et al., 2018) and CLUE.io platforms (Subramanian et al., 2017). The datasets produced by the PCCSE can be visualized in the form of heatmaps produced with the Morpheus matrix visualization and analysis software (https://software.broadinstitute.org/morpheus/).

Necessary Resources

Hardware

Desktop or a laptop computer, or a mobile device, with a fast Internet connection

Software

An up-to-date web browser such as Google Chrome (https://www.google.com/chrome/), Mozilla Firefox (https://www.mozilla.org/en-US/firefox/), Apple Safari (https://www.apple.com/safari/), or Microsoft Edge (https://www.microsoft.com/en-us/edge)

1.Navigate to the LINCS Panorama Repository dashboard at https://panoramaweb.org/project/LINCS/begin.view. The homepage provides an overview of the LINCS PCCSE assays and data. (Fig. 49)

The LINCS Panorama data repository homepage.

2.Hover over the “LINCS PCCSE Overview” drop-down menu at the top of the page and click on any of the selections to view standard operating procedures (SOPs), quality control, internal standards, and any posters or presentations created for introducing the PCCSE data.

Accessing P100 and GCP data with Panorama

3.Scroll to the “LINCS PCCSE Data Quick Access Table” section. Under the “Quick Links” heading, click on the “ALL P100 DATA” and “ALL GCP DATA” buttons (Fig. 50). The new page will display data tables containing all metadata and download links for the P100 or GCP datasets, respectively.

Note

Alternatively, click or hover over the “LINCS” drop-down menu at the top left of the homepage to view all available datasets (Fig. 51). The “LINCS” sub-menu is already expanded by default. Click on either the “P100” or “GCP” sub-heading underneath the “LINCS” menu option to view either dataset.

The “Quick Links” section of the LINCS Panorama homepage.

The LINCS sub-menu open on the Panorama homepage. Both LINCS P100 and GCP datasets can be accessed here.

4.On the assay-specific data page, under the “LINCS Data” section, each dataset is available in four different levels (Figs. 52 and 53).

a.Click on the Skyline link for the Level 1 data for any dataset to view a data table of all precursors in the dataset in a new page (Fig. 54).
b.Click on the download icon for the Level 2-4 data for any dataset to directly download the GCT files.
c.Click on the “View in Morpheus” link for the Level 2-4 data for any dataset to see a heatmap of the corresponding assay data (Fig. 55). The columns correspond to various drug treatments, while the rows correspond to genes. Refer to steps 9-12 below to understand the heatmaps.

Example Level 1 Skyline data page for a LINCS P100 dataset on Panorama.

Example Level 4 heatmap created with Morpheus. Columns are drugs while rows correspond to genes.

The table containing all LINCS P100 data in Panorama.

The table containing all LINCS GCP data in Panorama.

5.Scroll to the “LINCS PCCSE Data Quick Access Table” section to view other datasets. Click on any link to be taken to the dataset page.

6.Scroll to the “Targeted MS Runs” section for a list of annotated mass spectrometry data for each plate (Fig. 56). Click on any file name to download the file. To view a list of all proteins, peptides, precursors, transitions, and replicates in the dataset, click on the number corresponding to each column. Use the grid at the top of the table to customize the table, create a chart, or export the data table.

List of all Skyline files for targeted mass spectrometry datasets on Panorama. Metadata on each dataset is included as well as download links.

7.Scroll to the “Mass Spec Search” section to search for specific proteins, peptides, or modifications in the data using the search box (Fig. 57).

The Mass Spec search box on Panorama, which allows for searching specific proteins, peptides, or modifications among all mass spectrometry datasets.

8.Scroll to the “Messages” section to view updates or corrections to the datasets (Fig. 58).

The Messages section on Panorama contains published updates or corrections to the data.

Visualizing LINCS Proteomics data with Morpheus

9.Under the “Quick Level 4 Data Visualization” heading, click on any link in the table to view visualizations of P100 and/or GCP data for the indicated drug class and cell line in Morpheus, similar to the results from step 4c above (see Fig. 55). The name of the dataset displayed is shown on the tab heading at the top of the page.

10.Hover over any box to see the value corresponding to the effect of the specified drug on the histone or peptide. The names of the drug, well number, and histone/peptide will also appear at the top of the heatmap.

11.Use the search bar at the top of the page to filter out specific data entries. Select whether to filter by rows or columns, and which category to filter by, then enter in the query term. If the term appears in the data, the term and the column or row category it belongs to will automatically appear in the search bar. Use the up and down arrow keys next to the search bar to move between search results. Click the “Matches at Top” button to automatically move selected entries towards the top left of the heatmap (Fig. 59).

Example of filtering by column on a Morpheus heatmap. The small molecule CC-401 is entered into the search bar, and the corresponding columns are highlighted.

12.Use the additional tool options at the top of the heatmap to customize or save the image.

a.Use the zoom drop-down menu to zoom into, or out of the heatmap.
b.Click the options button to customize the annotations, color scheme, or display settings for the heatmap (Fig. 60).
c.Click the save button to save the heatmap to PNG, PDF, or SVG file format.
d.Click the color key button to view the range of data values, and the color to which each value corresponds on the heatmap.

Display options tab in the Morpheus heatmap options. Use the Annotations tab to change row or column labels, the Color Scheme tab to change the heatmap colors, and the Display tab to format the heatmap layout.

Basic Protocol 7: THE LINCS JOINT PROJECTS (LJPs)

The two LINCS Joint Projects involve collaborations between several LINCS DSGCs and the DCIC. The Broad-HMS LINCS Joint Project explores the dose-dependent sensitivities of six nonmalignant and cancerous breast tissue cell lines to 107 small molecule perturbagens applied at six different doses. The MEP-HMS LINCS Joint Project assessed the dose-dependent responses of 72 nonmalignant and cancerous breast tissue cell lines to 139 small molecule and antibody perturbagens applied at nine different doses. Data from both joint projects are available from the GR Browser (Clark et al., 2017) and the LINCS Joint Project Breast Cancer Network Browser (Niepel et al., 2017). The aim of both projects was to explore the dose-dependent responses of human cells to a focused set of common perturbations under common conditions with multiple readouts. This basic protocol provides a tutorial about how to access and analyze the data generated by these joint projects, as well as the associated software tools developed to analyze these datasets.

Necessary Resources

Hardware

Desktop or a laptop computer, or a mobile device, with a fast Internet connection

Software

An up-to-date web browser such as Google Chrome (https://www.google.com/chrome/), Mozilla Firefox (https://www.mozilla.org/en-US/firefox/), Apple Safari (https://www.apple.com/safari/), or Microsoft Edge (https://www.microsoft.com/en-us/edge)

Viewing the dose-response grid with the GR Browser

1.Navigate to the GR Browser website at http://www.grcalculator.org/grbrowser/. By default, the page will display the dose-response grid of the Broad-HMS LINCS Joint Project dataset (Fig. 61).

GR Browser homepage. By default, the Broad-HMS LINCS Joint Project is selected on the left, and the corresponding dose-response grids are displayed.

2.The Dose-Response Grid tab shows the dose-response curves corresponding to each cell line compiled for each compound.

3.Choose a dataset to explore using the “Select Dataset to Browse” menu on the left of the screen; the two LINCS Joint Projects are the “Broad-HMS LINCS Joint Project” and the “MEP-HMS LINCS Joint Project”. To view only data corresponding to a specific molecule or cell line, click the “Subset Data” button underneath the dataset list, and enter in the relevant molecule or cell line.

4.Hover over a cell line in the floating box titled “Cell_Line” to highlight the cell line-specific dose-response curves (Fig. 62).

Hovering over “Hs-578T” in the Cell_Line box will highlight the dose-response curves corresponding to Hs-578T cells and the respective small molecule for each grid in the GR Browser.

5.Click “Toggle View” to switch to viewing the dose-response curves for each compound compiled by cell line. Hover over any compound name in the floating box to highlight the compound-specific dose curves in the grid (Fig. 63).

Hovering over “Afatinib” in the Small_Molecule box will highlight the dose-response curves corresponding to afatinib treatment of the respective cells for each grid in the GR Browser.

Comparing GR metrics with the GR Browser

6.The GR Metric Comparison tab provides comparative visualizations of different dose-response metrics across different cell lines and small molecules (Fig. 64). By default, a boxplot is displayed, showing the GR50 measurements for the first nine small molecules by alphabetical order.

An example boxplot from GR Browser displaying the distribution of GR50 values for the first nine small molecules in the dataset by alphabetical order.

7.Use the menu on the left side of the tab to select either a boxplot or a scatterplot visualization, then choose a metric to visualize using the “Select parameter” drop-down menu. The following GR metrics are available:

GR50: Concentration at which the effect reaches a GR value of 0.5 based on interpolation of the fitted curve.
GRmax: Effect at the highest tested concentration.
GRinf: GR(c → ∞): Effect at infinite concentration based on extrapolation of the fitted curve, which reflects asymptotic drug efficacy. Note that GRinf can differ from GRmax if the measured dose-response does not reach its plateau value.
GEC50: Drug concentration at half-maximal effect, which reflects the potency of the drug.
hGR: Hill coefficient of the sigmoidal curve, which reflects how steep the dose-response curve is.
GRAOC: Area over the dose-response curve, which is the integral of 1–GR(c) over the range of concentrations tested.

8.Choose to compare either small molecules or cell lines using the “Select grouping variable” drop-down menu, then enter in the specific molecules or cell lines you would like to compare in the “Show/hide data” box. By default, the first ten options in alphabetical order are displayed.

9.Click on the “Plot Options” button to display options for customizing the plot size, labels, and margins.

10.Click the “Download Image” button above the plot to download the plot as either a TIFF (.tiff) or PNG (.png) file.

Viewing dose-response data and metadata in the GR Browser

11.The Data Table tab displays the full table of dose-response metrics and metadata for each perturbation (Fig. 65).

The GR Browser Data Table, which provides GR metrics and metadata for each small molecule perturbation.

12.Click the arrows next to any column name in the table to sort the table by the values in that column in ascending or descending order.

13.Enter a specific value into the box below each column name to filter the values in that column, or enter a value into the search box at the top right of the tab to search across all columns. In the example figure (Fig. 66), the data is filtered by AZD compounds using the search box at the top right, and by the BT-20 cell line using the “Cell_Line” column.

Example of filtering the Data Table in the GR Browser. AZD has been entered into the search box, while BT-20 has been entered into the cell_line column filter, and all entries corresponding to the cell line BT-20 that contain “AZD” are now displayed.

14.Copy the table or download the table as a CSV (.csv), TSV (.tsv), or Excel (.xlsx) file using the corresponding buttons above the table.

Accessing the LINCS Joint Project data with the HMS LINCS database

15.Navigate to the HMS LINCS database at https://lincs.hms.harvard.edu/db/datasets/. This page displays a table containing all available datasets from the HMS LINCS DSGC, including the dataset ID, dataset name, and the type of data available.

16.In the search box near the top of the page, enter in “LINCS Joint Project”. The table will then filter only the 17 Broad-HMS LINCS Joint Project datasets (Fig. 67).

The list of filtered LINCS Joint Project datasets on the HMS LINCS Database.

17.Click on any ID in the “HMS Dataset ID” column to view a specific LJP dataset. The data and metadata are divided into various detailed tabs, each of which can be downloaded as Excel (.xlsx) or CSV files by using the download links in the top right corner of the tab. By default, the “Detail” tab is shown on the new page, which provides project information and metadata (Fig. 68).

The dataset page for LINCS Joint Project Dataset ID:20259 on the HMS LINCS Database. By default, the Details tab is shown, which contains metadata on the dataset and assay.

18.Click the “Small Molecules Studied” tab to view the various small molecules profiled by the LJP in the chosen dataset (Fig. 69). Click on any of the small molecule IDs under the “HMS LINCS ID” column to display all available metadata on the molecule; the example figure shows metadata for neratinib (Fig. 70).

The Small Molecules Studied tab for one of the LINCS Joint Project datasets, ID:20259 on the HMS LINCS Database. Metadata for all small molecules studied in this dataset are shown.

The metadata page for neratinib on the HMS LINCS Database.

19.Click the “Cell Lines Studied” tab to view the cell lines corresponding to the chosen dataset (Fig. 71). Click on an ID under the “HMS LINCS ID” column to display metadata on the chosen cell line; the example figure shows the cell line BT-20 (Fig. 72).

The Cell Lines Studied tab for one of the LINCS Joint Project datasets, ID:20259 on the HMS LINCS Database. Only one type of cell, BT-20, was used in this dataset.

The metadata page for the cell line BT-20 on the HMS LINCS Database.

20.Click the “Data Columns” tab to view the descriptions of each column in the results table for the given dataset (Fig. 73).

The Data Columns tab for one of the LINCS Joint Project datasets, ID:20259 on the HMS LINCS Database. Each row provides metadata for a column in the results table of the dataset.

21.Click the “Results” tab to view the actual data contained in the dataset (Fig. 74). Each row represents an experimental replicate for a single or combination small molecule perturbation that was applied to the specified cell line.

The results contained in LINCS Joint Project dataset ID:20259. Each row represents a drug combination treatment of BT-20 cells. Rows with blank entries are controls.

Accessing the LINCS Joint Project Breast Cancer Network Browser

22.Navigate to the LINCS Joint Project Breast Cancer Network Browser (BCNB) at https://maayanlab.cloud/LJP/. The homepage displays a network visualizing all perturbational gene expression signatures obtained from the LJP (Fig. 75).

By default, the shape of each point represents the cell line, the size represents the approximate GR value of the small molecule, and the color represents the drug class of the small molecule. The legend on the left side of the network provides all relevant mappings.

The LINCS Joint Project Breast Cancer Network Browser homepage. Data from the LINCS Joint Projects are visualized on the scatterplot. The shape, size, and color of points can be adjusted, and are labeled in the legend on the left. Each point represents a single gene signature.

23.Use the drop-down menu on the right side to adjust the shape, color, and size.

The shape may be determined by cell line, timepoint, or concentration.
The color may be determined by several perturbational metrics and metadata, including GR value,p-value, cell line, timepoint, or concentration. The color may also be determined by cellular function or role, or the most enriched term for the signature from several gene set libraries.
The size may be determined by GR value,p-value, timepoint, or concentration.

24.Select the “Show labels” box beneath the drop-down menu to see information on the corresponding signature when hovering over a specific point on the network.

25.Use the zoom controls below the “Show labels” box, or the scroll function on your system to zoom in or out of the network. Click and drag the network to pan.

Basic Protocol 8: THE LINCS DATA PORTALS

The LINCS Data Portals were developed by the DCIC and can be used for viewing, downloading, and analyzing data generated by the LINCS DSGCs. There are three versions of the LINCS Data Portal; the LINCS Data Portal version 1 (Koleti et al., 2018) and version 2 (Stathias et al., 2019), which both correspond to earlier releases of LINCS data, and SigCom LINCS (Evangelista et al., 2022). The LINCS Data Portal 2.0 contains an upgraded user interface and enhanced metadata annotation compared to version 1.The latest LINCS Data Portal, SigCom LINCS, contains the most recent 2021 release of LINCS data, and contains many other features including enhanced metadata and signature search, single gene search, signatures from other sources, and term search, as well as global visualizations of the LINCS data.

Necessary Resources

Hardware

Desktop or a laptop computer, or a mobile device, with a fast Internet connection

Software

An up-to-date web browser such as Google Chrome (https://www.google.com/chrome/), Mozilla Firefox (https://www.mozilla.org/en-US/firefox/), Apple Safari (https://www.apple.com/safari/), or Microsoft Edge (https://www.microsoft.com/en-us/edge)

Using LINCS Data Portal Version 1

1.Navigate to the LINCS Data Portal website (Fig. 76): http://lincsportal.ccs.miami.edu/dcic-portal/.

Selecting the L1000 data from the homepage

2.The left panel has Methods selected by default. In the center panel that lists the 15 available methods, click on “L1000”, and then in the right panel, click on “L1000 mRNA profiling assay” (Fig. 77). Statistics for the associated datasets, small molecules, cells, and genes are displayed above this table. Clicking on any of these icons will navigate you to the L1000 data grouped by that category.

Statistics for the datasets, small molecules, cells, and genes associated with the L1000 assay.

Viewing the L1000 datasets at a high level

3.Click on the “10 Datasets” option above the table on the homepage, to be navigated to an overview of the 10 available L1000 datasets (Fig. 78). By default, Table View of the results is provided, and this content can alternatively be displayed in List View. For each returned dataset, the associated LINCS Center, assay, method, subject area, and data level are provided.

Table of datasets that were generated using the L1000 mRNA profiling assay.

4.Use the search bar at the top of the page to query by keyword and use the menu on the left to filter datasets by Center, Project, and other criteria. Each result is associated with a hyperlink that can be clicked to display a detailed description and metadata, and a tab to download that dataset (Fig. 79). Additionally, action icons are listed for each entry, and these include Source Link, Dataset Statistics, and Download; each icon can be clicked to execute the selected action.

Dataset-specific page for CRISPR Perturbagens with identifying metadata for the dataset.

Viewing the L1000 datasets from a small-molecule perspective

5.Starting from step 2, with the L1000 data having been selected on the homepage, click the Small Molecules icon on the homepage. This will navigate to a small-molecule-centric view of the L1000 data (Fig. 80). The results are displayed in Table View by default, and List View can alternatively be selected via the button above the results display. Each result lists the small molecule's name, its synonyms, most advanced phase of clinical approval (Max Phase), mechanism of action, pharmacological classification, model systems, and associated datasets. A bar plot for each result displays the experimental platforms of the datasets with which that small molecule is associated. Results can be searched by assays, cell lines, and other keywords via the search box above the results display. The results can also be filtered by LINCS Center, bioassay type, clinical phase, pharmacological classification, and mechanism of action using the menu on the left bar. Clicking on the blue “Show” buttons will expand the lists of model systems and datasets and provide a hyperlink to each (Fig. 81).

List view of small molecules that were profiled in the L1000 assay and associated metadata.

Expanded view of small molecule metadata that includes the cell lines that molecule was profiled in, as well as datasets that the small molecule was included in.

Viewing the L1000 datasets from a cell perspective

6.Starting from step 2, with the L1000 data having been selected on the homepage, click the Cells icon on the homepage. This will navigate to a cell-centric view of the L1000 data (Fig. 82). The results are displayed in Table View by default, and List View can alternatively be selected via the button above the results display. Each result lists the cell line's name; its synonyms, associated organism, organ, and disease; perturbagens that have been applied to the cell line; associated L1000 datasets; LINCS Centers that generated the associated data; and external links. A bar plot for each result displays the experimental platforms of the datasets with which that cell line is associated. Results can be searched by assays, perturbagens, and other keywords via the search box above the results display. The results can also be filtered by the type of cell, LINCS Center that generated the associated data, tissue, disease, and assay, using the menu on the left bar. Clicking on the blue “Show” button will expand the list of datasets associated with each result and provide a hyperlink to each.

List view of cell lines that were profiled in the L1000 assay along with informative metadata.

Viewing the L1000 datasets from a gene perspective

7.Starting from step 2, with the L1000 data having been selected on the homepage, click the Genes icon on the homepage. This will navigate to a Harmonizome (Rouillard et al., 2016) query page that includes all the genes profiled by the L1000 assay (Fig. 83).

Harmonizome search page for genes included in the L1000 datasets.

8.Click on any of the genes to be redirected to a single gene landing page with identifying metadata and functional associations for the specific gene (Fig. 84)

Single gene landing page for MAP10 that includes identifying metadata for the gene, as well as functional associations for the gene.

Using LINCS Data Portal Version 2

9.Navigate to the LINCS Data Portal 2.0 website (http://lincsportal.ccs.miami.edu/signatures/home; Fig. 85).

10.Select the “Metadata Search” and type in a query term of interest (i.e., A375).

Note

Inputting the search term will recommend perturbations, model systems, and signatures associated with the query term (Fig. 86).

Metadata search bar with suggestions for perturbations, model systems, and signatures associated with “A375”.

Signature search

11.Select the “Signature Search” option and query an up-regulated and down-regulated set of genes, or click the “Example” text to populate the search boxes with example sets of genes (Fig. 87). Click “Submit Signature” to be redirected to a results page.

Note

Alternatively, select the “Gene List” option to submit a list of differentially expressed genes, or click the “Example” text to populate the search box with an example list of genes.

Signature search page with up-regulated and down-regulated gene sets input into the respective search boxes.

12.The results page for the signature search displays a table of the most similar and dissimilar signatures to the input, ranked by the absolute similarity values. Additionally, each signature row contains metadata about the signature including assay, perturbagen, cell line, organ, time point, and concentration (Fig. 88). Click the “Download Signatures” button to download the table.

Table of results with highly similar and dissimilar signatures to the input signature sorted by absolute similarity values. Each row includes metadata for the signature.

Exploring available data in the LINCS Data Portal Version 2

13.Click “Assays” in the navigation bar to view a list view of the assays used to generate gene expression signatures, the data generating center, the area of study, the assay method, and the number of datasets available (Fig. 89).

Table of assays used to generate the datasets available in the portal.

14.Click “Perturbations” in the navigation bar to be redirected to a list view of small molecules that were screened for their effect on gene expression (Fig. 90). Each row includes the mechanism of action, target, max FDA phase, and the signature categories that are applicable to the small molecule.

Table of small molecule perturbations and their metadata.

15.Click the “Gene Knockdowns” subtab on the “Perturbations” page to view a list of genes that were targeted with sgRNA to observe the effect of their knockdown on gene expression (Fig. 91). Each of the genes includes metadata regarding the perturbagen class, reagent type, subtype, and Entrez ID.

Table of genes knocked down by sgRNA to observe the effect on gene expression, and their metadata.

16.Click “Model Systems” in the navigation bar to view a list of cell lines that were profiled in the assays (Fig. 92). Each row includes metadata for the cell lines including organ, model class system, and tissue of origin.

Table of model systems profiled in L1000 signatures.

17.Click “Signatures” in the navigation bar to view all available signatures and metadata that includes the perturbation category, dataset of origin, perturbagen, cell line, organ, time point, and concentration (Fig. 93).

Table of all available signatures and their metadata.

Using SigCom LINCS

18.Navigate to the SigCom LINCS homepage at https://maayanlab.cloud/sigcom-lincs (Fig. 94).

Performing signature search enrichment analysis

19.Select the “Up/Down Gene Sets” button, then enter up-regulated and down-regulated gene names into the respective input boxes (Fig. 95). Each gene should be on its own row. The upper right-hand corner of each input box will display how many gene symbols are valid. To fix the names of genes with invalid entries, toggle the “Validate” option in the upper left-hand corner of each input box. You will be presented with all the symbols that are valid and suggestions to fix entries based on synonyms.

Signature search page with input boxes populated with up-regulated and down-regulated genes.

20.Click the dark blue “Search” button below the input boxes.

21.The top signature results for the input will be displayed separately by dataset (Fig. 96). Blue bars indicate reverser/opposite signatures, while orange bars indicate mimicker/similar signatures. Higher rank position, longer bar length, and darker color indicate results with greater significance. Hover over any bar to view the z- score generated from the Fisher Exact Test for the corresponding signature when compared to the input.

Top signature results that mimic or reverse the input signature displayed by dataset.

22.Click the expand icon on any of the perturbation types to view more detailed results. As an example, expand the “LINCS L1000 Chemical Perturbations” results.

a.The “Bar Chart” detailed view tab (Fig. 97) provides a larger view of the bar charts from the initial results page, as well as tables containing all computed statistical values for each of the reverser and mimicker drugs. Tables can be downloaded as a TSV file using the download icons next to the top left of the tables.
b.The “Clustergram” detailed view tab (Fig. 98) provides a clustergram plot showing the top 10 signatures in which the input genes are most up-regulated and most down-regulated. The plot can be adjusted using the toolbar to the left. Hovering over a cell in the clustergram shows the rank of a gene (row) with respect to the given signature (column), with a low rank indicating down-regulation of the gene.

Bar chart visualization and tables of L1000 Chemical Perturbation signatures mimicking or reversing the input signature.

Clustergram visualization of top signatures where input genes were up-regulated and down-regulated, respectively.

Metadata search

23.Select the dark blue “Any Search Term” box on the homepage (see Fig. 94), then select the orange “Perform Metadata Search” option when it appears. Enter any term of interest into the input box (disease, cellular process, drug name, gene symbol, cell line, or any other term). As an example, query the term “dexamethasone”, then select the “Signature Search” button (Fig. 99).

24.Select subsets of the results by choosing the “Data and Signature Generation Center”, “Dataset”, “Cell Line”, “Perturbagen Type”, or “Perturbagen” using the filter menu on the right. For example, select “LINCS Transcriptomics” under the “Data and Signature Generation Center” menu to filter returned signatures to only those generated by the LINCS Transcriptomics center (Fig. 100).

Signatures containing the keyword “dexamethasone”, filtered to only signatures generated by the LINCS Transcriptomics DSGC using the menu on the right.

25.Click on the three dots icon to the right of each signature result (Fig. 101) to download the full signature, download the top up and down genes as a GMT file, or submit the up and down genes from that signature to the SigCom LINCS Signature Search. See steps 4-5 above to understand how to interpret the Signature Search results.

Drop-down menu of actions that can be performed on a signature of interest, including performing a signature search, downloading the signature, and performing gene set enrichment analysis.

26.Click on a signature name to view detailed metadata for that signature (Fig. 102).

Signature metadata page for the signature “CPC006_MCF7_24H_O03_dexamethasone_10uM”.

27.Scroll below the metadata information to view the top up- and down-regulated genes in the signature. By default, the up genes are shown (Fig. 103). Select the “down” tab to view down genes.

Significantly up-regulated genes for a signature of interest.

28.Return to the metadata search page by using the back button in your browser. To search through available datasets, click on the “Datasets” tab under the metadata search bar (Fig. 104). As an example, remove any existing terms in the search bar and query “L1000”. Dataset results can be sorted by Data and Signature Generation Center or Assay using the menu on the right-hand side. As an example, select “LINCS Transcriptomics” as the Data and Signature Generation Center. Hover over the FAIRshake icon for a dataset to view scores for each of the categories. Click on the download icon to download the dataset.

Metadata search page for datasets, filtered by datasets that were created using the L1000 platform.

29.Click on any dataset name to view metadata for that dataset, as well as all signatures belonging to the dataset (Fig. 105).

Dataset metadata page for Gene Knockdowns from LINCS Transcriptomics that also includes signatures generated within the dataset.

30.Return to the metadata search page by using the back button in your browser. To search for a gene, click on the “Genes” tab under the metadata search bar (Fig. 106). Remove any existing terms and enter a gene of interest, such as ACE2, in the search bar. All matching results will load automatically.

Metadata search page for genes, filtered by the gene symbol “ACE2”.

31.Click on a gene name to view signatures where the gene is significantly up-regulated or down-regulated (Fig. 107).

Gene page for ACE2 that includes signatures in which ACE2 is significantly up-regulated or down-regulated.

Visualizing SigCom LINCS signatures using UMAP

32.Click the “UMAPs” tab in the navigation bar at the top of the page. The page includes several screenshots of UMAP visualizations of various signature datasets, each of which can be selected to view a full-size visualization. Additionally, there is a table describing each dataset, associated metadata, and links that redirect to interactive and static visualizations for each dataset (Fig. 108).

Page of UMAPs generated for each dataset included in SigCom LINCS.

33.For an example of a static plot, click on the “VIEW” link for the “Normalized L1000 signatures colored by perturbation type” dataset in the table. The visualization is a static UMAP plot of all L1000 signatures colored by perturbation type (Fig. 109).

Static UMAP plot of Normalized L1000 signatures colored by perturbation type.

34.For an example of an interactive plot, click on the “VIEW” link in the “Interactive plot” column for the “Automatic Human GEO RNA-seq Signatures” dataset in the table. The visualization is an interactive UMAP plot where signatures are colored by GSE ID. Each point represents a signature and can be moused over for more information (Fig. 110).

Interactive UMAP plot of Automatic Human GEO RNA-seq Signatures with a signature of interest moused over and highlighted.

Basic Protocol 9: CREATING AND ANALYZING SIGNATURES WITH iLINCS

iLINCS is a cloud-based platform maintained by the LINCS DCIC (Pilarczyk et al., 2020). iLINCS provides access to raw data and processed signatures and the ability to analyze these data using various workflows. The iLINCS portal has several user interfaces for analyzing transcriptomics and proteomics LINCS datasets. The portal integrates the R analytical engine via several R tools for web computing (rserve, opencpu, Shiny, rgl) and DCIC-developed web tools and applications (FTreeView, Enrichr, and X2K; Clarke et al., 2018) into a coherent web platform for LINCS data analysis. Users can follow several workflows that enable identifying differentially expressed genes, proteins, and phosphoproteins in LINCS datasets, and then use these signatures for analysis together with other LINCS and non-LINCS datasets, and in the analysis of LINCS L1000 signatures.

Necessary Resources

Hardware

Desktop or a laptop computer, or a mobile device, with a fast Internet connection

Software

An up-to-date web browser such as Google Chrome (https://www.google.com/chrome/), Mozilla Firefox (https://www.mozilla.org/en-US/firefox/), Apple Safari (https://www.apple.com/safari/), or Microsoft Edge (https://www.microsoft.com/en-us/edge)

1.Navigate to the iLINCS data portal (http://www.ilincs.org/ilincs/).

2.The homepage includes a search bar for querying search terms related to datasets, signatures, compounds, and genes found in iLINCS (Fig. 111). Type everolimus into the search field to launch a search.

3.The results page displays LINCS datasets, non-LINCS datasets, signatures, and compounds that match everolimus (Fig. 112).

Search results page for datasets, signatures, compounds, and genes that match the query term “everolimus”.

4.Expand the tab containing signatures to view a table of signatures related to “everolimus”. The table can be filtered by typing in filter keywords below each of the column headers, for example, type MCF7 below the cell line header to filter by signatures from the MCF7 cell line (Fig. 113).

Expanded signature results that displays a table of signatures that include the term “everolimus”. Each of the table columns can be filtered with keywords of interest.

Signature Details

5.Click on the “LINCSCP_133490” signature within the signature id column to be redirected to a page with signature details (Fig. 114).

Signature-specific page with details about the signature and various options for analyzing the signature.

6.Click the “Modify the list of selected genes” button on the left to generate a volcano plot of differentially expressed genes (Fig. 115). The top 100 differentially expressed genes are selected by default. Use the sliders to change the differential expression range and p- value cutoff to change the number of top differentially expressed genes. Change between a static volcano plot and interactive volcano plot by clicking the “Static volcano plot” or “Interactive volcano plot” button. Click on any of the download buttons to download the volcano plot in the preferred file format.

Volcano plot of differentially expressed genes within the signature. The top 100 genes are selected by default and the sliders allow for modifying the differential expression range and p-value cut-off.

7.Several signature analysis tools are available for further exploration and visualization of the selected signature. Mouse over any of the signature analysis tools buttons to display a pop-up box with information, and click on any of the buttons to launch the analysis. As an example, click the “Pathway Analysis” button to be redirected to a new page with a SPIA Functional Pathway Analysis table of the top enriched KEGG pathways computed from the differentially expressed genes from the query signature (Fig. 116).

SPIA Functional Pathway Analysis table of the differentially expressed genes from the signature displaying the top enriched KEGG pathways.

8.Click the “Signature data” tab at the bottom of the page to display a table of the genes included in the signature and their differential expression levels and p- values (Fig. 117). Click on “Show selected genes” within the tab to view the top 100 differentially expressed genes that are computed by default and their expression levels and p- values (Fig. 118). To change the selected genes, see step 6.

Table of genes included in the signature, their differential expression value, and p-value.

Table of top 100 differentially expressed genes in the signature that displays each differential expression value and p-value.

Connected Signatures

9.Click the “Connected Signatures” tab to view other pre-computed signatures connected to the selected signature based on Pearson correlation coefficient concordance (Fig. 119). As an example, expand the tab labeled “LINCS chemical perturbagen signatures” to display a table of chemical perturbagen signatures correlated to the selected signature. The table displays metadata for each signature, in addition to the concordance values, p- values, and number of overlapping genes (Fig. 120). Bar plots of top occurring perturbagens, targets, concentrations, cell lines, and time points across the signatures are displayed above the table.

Tab containing various drop-down tables that contain signatures connected to the input signature based on Pearson correlation coefficient concordance.

Drop-down table of top LINCS Chemical Perturbagen signatures connected to the input signature ranked by Pearson correlation coefficient concordance.

10.To select a group of signatures for analysis, click the “Selection” drop-down menu and click “Select First 100” to select the top 100 correlated chemical perturbagen signatures to the query signature, as indicated by a checkbox to the left of the Signature ID (Fig. 121). All other signatures after the first 100 will be unselected, as indicated by an unchecked box. Different signature groupings can be selected and deselected within the menu. Next, click the “Analyze” drop-down menu and select “Group Analysis” to perform a signature group analysis on the selected signatures (Fig. 122).

Drop-down menu for selecting the number of top correlated signatures for consideration in group analysis.

Drop-down menu of grouped signature analysis options.

11.A pop-up menu will appear with a table of the selected signatures where signatures can once again be selected and deselected (Fig. 123). By default, 50 genes from each signature will be used in the analysis, but this field can be changed with the desired number of genes. Click “Analyze 100 signatures”.

Table of signatures for consideration in group analysis that can be selected/deselected.

12.A new page will be generated with a table of the selected signatures, which can be downloaded as signature data or a correlation matrix (Fig. 124). At the bottom of the page are analyses from various tools for clustering and visualizing the submitted signatures. As an example, click “Morpheus Signatures Heatmap”.

Downloadable table of selected signatures for group analysis. At the bottom of the page are pre-computed signature group analyses that can be viewed by clicking on the respective icon.

13.A new page will be generated with a heatmap of the submitted signatures clustered by the various metadata associated with each signature, like perturbagens, cell lines, etc. (Fig. 125).

Morpheus signatures heatmap page with signatures clustered by perturbagens, cell lines, etc.

Connected Perturbations

14.Navigate back to the Signature Details page and click on the “Connected Perturbations” tab (Fig. 126). This tab includes aggregated tables of correlated gene knockdown and chemical perturbagen signatures.

Drop-down table of top perturbation signatures connected to the input signature.

15.Expand the “LINCS gene knockdowns” tab to view the top correlated knockdown signatures with the query signature (Fig. 127). The table includes the target genes and pathways in addition to the various metrics that qualify the knockdown signatures as related to the query signature.

Table of LINCS gene knockdown signatures related to the query signature ranked by z-score. The table includes signature metadata like the target gene of the knockdown and associated pathways, in addition to the direction of the correlation with the query signature, p-value, and FDR.

16.Expand the “LINCS chemical perturbagens” tab to view the top correlated perturbagen signatures with the query signature (Fig. 128). The table includes the perturbagen id, perturbagen name, and perturbagen targets in addition to the various metrics that qualified the perturbagen signatures as related to the query signature.

Table of LINCS chemical perturbagen signatures related to the query signature ranked by z-score. The table includes perturbagen metadata, in addition to the direction of the correlation with the query signature, p-value, and FDR.

Creating and analyzing signatures

17.Switch to the datasets workflow by clicking on the “Datasets” button in the navigation bar at the top of the page. This page includes over 15,000 datasets of pre-processed signatures (Fig. 129). Select and deselect datasets of interest and use the drop-down menus on the right to narrow the search to terms of interest. For the demo, select only the “TCGA” dataset, select proteomics data from the “Data Type” drop-down menu, and type breast into the keyword search bar to narrow the search to datasets with proteomic data related to breast cancer (Fig. 130). To explore and analyze a dataset, click the “Analyze” button in the “TCGA_BRCA_RPPA_2019” dataset.

Datasets workflow page with over 15,000 datasets of pre-processed signatures available for analysis.

Filtered dataset page displaying TCGA datasets that also include the search term “breast”.

18.To create a signature, click the “Create a signature” button on the left (Fig. 131). On the left are drop-down menus of variables to separate the samples into two groups for comparison. For the grouping variable, select “PAM50_mRNA”. For the treatment group, select “HER-2 enriched” samples. For the baseline group, select “Luminal A” samples (Fig. 132). Once the grouping criteria are selected, click “Create signature” to generate the signature.

Page for exploring and analyzing a dataset of interest, in this case the “TCGA_BRCA_RPPA_2019” dataset from TCGA. The page includes options for creating a signature, multi-group analysis, and options for analyzing a list of genes, along with tabs for exploratory visualizations and data/metadata associated with the dataset.

Signature creation page for the “TCGA_BRCA_RPPA_2019” dataset where grouping variables for creating the signature can be selected from drop-down menus.

19.A signature details page will be created for the generated signature (Fig. 133). To further explore and analyze the signature follow steps 6-16.

Signature details page for the generated signature from the “TCGA_BRCA_RPPA_2019” dataset.

COMMENTARY

Background Information

The LINCS program consisted of six Data and Signature Generation Centers (DSGCs) and one Data Coordination and Integration Center (DCIC). Although funding for the LINCS program has ended and no new LINCS datasets are expected to be produced and published, the DCIC and each of the DSGCs continue to host existing data and develop new software tools. All the software tools mentioned in the protocols presented here will be available in the foreseeable future. A few new software tools and platforms, such as SigCom LINCS (Evangelista et al., 2022) and Appyters (Clarke et al., 2021), are still being actively developed (as of 2022), and the LINCS DCIC is committed to maintaining and upgrading these resources in the coming years.

In addition, the LINCS DCIC is participating in the Common Fund Data Ecosystem (CFDE) NIH Common Fund program. This effort aims to standardize metadata across NIH Common Fund data coordination centers (DCCs) (Charbonneau et al., 2022). For the CFDE efforts, most of the LINCS data and metadata have been archived on an Amazon Web Services S3 bucket using a STRIDES account. Persistent download links for these datasets can be found within LINCS metadata tables now available from the CFDE portal (https://app.nih-cfde.org/) and SigCom LINCS.

The LINCS Data Coordination and Integration Center (DCIC)

The LINCS DCIC focused on four main aspects: (1) constructing an integrated knowledge environment for accessing LINCS data; (2) conducting research on regulatory networks with LINCS data; (3) establishing community training and outreach opportunities centered on LINCS data; and (4) coordinating the activities of the consortium and involvement of LINCS in other efforts. Although the LINCS program period has ended, the LINCS DCIC continues to engage in outreach activities and provide access to LINCS digital resources.

The Drug Toxicity Signature Generation Center (DToxS) DSGC

The Drug Toxicity Signature (DToxS) DSGC generated cellular signatures related to adverse drug effects, with the goal of mitigating these effects via the coadministration of other drugs. Transcriptomics and proteomics data were collected from multiple cell lines that were treated with either single drugs or complementary drug combinations. This experimental data was then computationally analyzed to generate sets of signatures for each single drug or drug pair.

The Harvard Medical School (HMS) LINCS DSGC

The HMS LINCS DSGC aimed to understand the underlying mechanisms of drug sensitivity and dose-response relationships by studying cellular responses to small molecule perturbations, with a focus on kinase inhibitors, epigenome modifiers, and ligands. Data was collected via mRNA profiling, mass spectrometry proteomics, immunoassays, and cell imaging.

The LINCS Center for Transcriptomics DSGC

The LINCS Center for Transcriptomics generated a comprehensive collection of transcriptomic profiles, including the L1000 dataset, which expands upon the original Connectivity Map (CMAP; Lamb et al., 2006). The L1000 dataset covers over 50 cell types, to which nearly 82,000 perturbagens were applied at varying doses and timepoints (Subramanian et al., 2017). This DSGC has produced a staggering collection of over 3 million gene expression profiles to date.

The LINCS Proteomic Characterization Center for Signaling and Epigenetics (PCCSE) DSGC

The LINCS Proteomic Characterization Center for Signaling, and Epigenetics (PCCSE) collected data on changes in phosphorylation and protein expression in response to various perturbations. Data were measured using the P100 and Global Chromatin Profiling (GCP) assays. The PCCSE also collaborated with the LINCS Center for Transcriptomics to integrate L1000 transcriptomic data with PCCSE experiments.

The Microenvironment Perturbagen (MEP) LINCS DSGC

The Microenvironment Perturbagen (MEP) LINCS DSGC examined the effects of the microenvironment on cellular phenotypes and molecular networks. MEP LINCS data integrates quantitative fluorescence imaging-based assays with transcriptional and proteomics data to provide a comprehensive view of how microenvironment perturbagens affect regulatory networks.

The NeuroLINCS DSGC

The NeuroLINCS DSGC aimed to identify targets for the development of drugs against neurodegenerative diseases, focusing on amyotrophic lateral sclerosis (ALS) and spinal muscular atrophy (SMA). NeuroLINCS data consists of transcriptomics, proteomics, and imaging profiles of motor neurons (iMNs) derived from induced pluripotent stem cell (iPSC) technologies.

Critical Parameters

Workflow for the L1000 Assay

The workflow to measure gene expression with the L1000 assay involves ligation-mediated amplification (LMA) followed by fluorescently addressed microspheres to capture amplification products (Subramanian et al., 2017).

In Step 1, mRNA is reverse transcribed into cDNA.
In Steps 2 and 3, landmark gene–specific upstream and downstream probes are annealed to the cDNA, and then ligated. The upstream probe has a unique 24-mer barcoded sequence and a 50-biotin label.
In Steps 4 to 6, the probes are amplified via polymerase chain reaction (PCR) using biotinylated primers, and are hybridized to polystyrene microspheres (beads) of distinct fluorescent colors, via their barcodes. Each bead recognizes two barcodes; many amplified molecules that feature either barcode can attach to a bead. To permit each bead to be analyzed for both color (indicating landmark transcript identity) and fluorescence intensity (indicating landmark abundance), streptavidin-phycoerythrin (SAPE) staining of biotin is performed. The beads are then sent to Luminex FlexMap 3D flow cytometry detectors to measure how many probes are hybridized. This produces the L1000 Level 1 data, which is the raw, unprocessed flow cytometry data from the Luminex scanners. L1000 experiments involve the use of 384-well plates, with approximately 18 control replicates per plate. Each batch includes 2-4 plates, representing ∼366 samples per batch.

L1000 Data Levels

The L1000 data is available at different levels of processing; each level of processing is associated with a Level number:

Level 1: Raw, unprocessed flow cytometry data from Luminex scanners.
Level 2: Gene expression values per 1000 genes after deconvolution from Luminex beads.
Level 3: Gene expression profiles of both directly measured landmark transcripts and inferred genes, normalized using invariant set scaling followed by quantile normalization.
Level 4: Signatures with differentially expressed genes computed by robust z- scores for each profile relative to the population control.
Level 5: Processed signatures computed from replicate profiles using the moderated z- score (MODZ) method.

P100/GCP Data Levels

The P100 assay measures a reduced representation of the phosphoproteome consisting of 96 widely studied phosphopeptides. The P100 profiles of uncharacterized perturbations can be compared against profiles of drugs with known signaling pathways, and this way the signaling mechanisms of novel perturbagens can be inferred. The global chromatin profiling (GCP) assay measures global post-translational histone modifications in bulk chromatin. Using this platform, epigenetic signatures can be generated for small-molecule and genetic perturbations of epigenetic processes. The P100 and GCP assay data are available at different levels of processing, like the L1000 data:

Level 0: Raw mass spectrometry data
Level 1: Probe reads in the form of curated Skyline documents
Level 2: Raw matrix data of extracted signal ratios of probes vs. internal standards (log2 transformed)
Level 3: Processed and normalized matrix data derived from Level 2
Level 4: Differential matrix data generated by subtracting each sample from Level 3 by the plate-wide median ratio of each analyte

Troubleshooting

Table 1 lists common problems that may arise with these protocols, along with their possible causes and solutions.

Table 1. Common Troubleshooting Issues

Problem	Possible cause	Solution
Missing or inaccessible data on any LINCS data repository	The data is not available to non-registered users, or is unavailable for some other reason (e.g., unpublished data)	If creating an account is possible, do so and re-try downloading the data. Otherwise, contact site administrators or the relevant DSGC.
There is a bug in a LINCS tool	The tool and/or data used may require upgrading	Contact the DSGC responsible for the site. Some tools may be linked to GitHub repositories, in which case an issue may be created on the repository.
Deprecated LINCS data and tools	Because the LINCS program has ended, there may not be further updates or support for parts of LINCS data and software tools	It is still possible that existing LINCS datasets may be periodically updated based on new quality control and analysis procedures. Administrators at the DSGCs sites can provide information about whether updates should be expected. The LINCS DCIC is still actively developing and maintaining LINCS tools and databases, and most of these tools can be reliably accessed to analyze LINCS data. The LINCS DCIC provides support via online forms and e-mail.

Acknowledgments

This work was partially supported by NIH grants OT2OD030160, U54HL127624 and OT3OD025459.

Author Contributions

Zhuorui Xie : writing original draft, developing tools, testing tools, producing figures, adding content, developing tutorials, reviewing and collating text, and editing; Eryk Kropiwnicki : writing original draft, testing tools, producing figures, adding content, developing tutorials, reviewing and collating text, and editing; Megan L. Wojciechowicz : developing tutorials, adding content, and editing; Kathleen M. Jagodnik : writing original draft, testing tools, producing figures, adding content, developing tutorials, reviewing and collating text, and editing; Ingrid Shu : developing tutorials, producing figures, adding content, and editing; Allison Bailey : developing tutorials, producing figures, adding content, and editing; Daniel J. B. Clarke : developing tools, adding content, and editing; Minji Jeon : developing tools, adding content, and editing; John Erol Evangelista : developing tools, adding content, and editing; Maxim Kuleshov : developing tools; Alexander Lachmann : developing tools; Abhijna A. Parigi : reviewing text; Jose M. Sanchez : reviewing text; Sherry L. Jenkins : project administration, supervision, reviewing and editing; Avi Ma'ayan : conceptualization, funding acquisition, project administration, supervision, writing original draft, reviewing and editing.

Conflict of Interest

The authors declare no conflict of interest.

Open Research

Data Availability Statement

Data sharing is not applicable to this article as no new data were created or analyzed in this study.

Literature Cited

Barretina, J., Caponigro, G., Stransky, N., Venkatesan, K., Margolin, A. A., Kim, S., … Garraway, L. A. (2012). The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature , 483(7391), 603–607. doi: 10.1038/nature11003
Charbonneau, A. L., Brady, A., Czajkowski, K., Aluvathingal, J., Canchi, S., Carter, R., … White, O. (2022). Making Common Fund data more findable: Catalyzing a Data Ecosystem. bioRxiv , doi: 10.1101/2021.11.05.467504
Clark, N. A., Hafner, M., Kouril, M., Williams, E. H., Muhlich, J. L., Pilarczyk, M., … Medvedovic, M. (2017). GRcalculator: An online tool for calculating and mining dose–response data. BMC Cancer , 17(1), 698. doi: 10.1186/s12885-017-3689-3
Clark, N. R., Hu, K. S., Feldmann, A. S., Kou, Y., Chen, E. Y., Duan, Q., & Ma'ayan, A. (2014). The characteristic direction: A geometrical approach to identify differentially expressed genes. BMC Bioinformatics , 15(1), 79. doi: 10.1186/1471-2105-15-79
Clarke, D. J. B., Jeon, M., Stein, D. J., Moiseyev, N., Kropiwnicki, E., Dai, C., … Ma'ayan, A. (2021). Appyters: Turning Jupyter Notebooks into data-driven web apps. Patterns , 2(3), 100213–100213. doi: 10.1016/j.patter.2021.100213
Clarke, D. J. B., Kuleshov, M. V., Schilder, B. M., Torre, D., Duffy, M. E., Keenan, A. B., … Ma'ayan, A. (2018). eXpression2Kinases (X2K) Web: Linking expression signatures to upstream cell signaling networks. Nucleic Acids Research , 46(W1), W171–W179. doi: 10.1093/nar/gky458
Duan, Q., Reid, S. P., Clark, N. R., Wang, Z., Fernandez, N. F., Rouillard, A. D., … Ma'ayan, A. (2016). L1000CDS2: LINCS L1000 characteristic direction signatures search engine. NPJ Systems Biology and Applications , 2(1), 16015. doi: 10.1038/npjsba.2016.15
Edgar, R., Domrachev, M., & Lash, A. E. (2002). Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Research , 30(1), 207–210. doi: 10.1093/nar/30.1.207
Enache, O. M., Lahr, D. L., Natoli, T. E., Litichevskiy, L., Wadden, D., Flynn, C., … Subramanian, A. (2019). The GCTx format and cmap{Py, R, M, J} packages: Resources for optimized storage and integrated traversal of annotated dense matrices. Bioinformatics , 35(8), 1427–1429. doi: 10.1093/bioinformatics/bty784
Evangelista, J. E., Clarke, D. J. B., Xie, Z., Lachmann, A., Jeon, M., Chen, K., … Ma'ayan, A. (2022). SigCom LINCS: Data and metadata search engine for a million gene expression signatures. Nucleic Acids Research , gkac328. doi: 10.1093/nar/gkac328
Fabian, M. A., Biggs, W. H. 3rd, Treiber, D. K., Atteridge, C. E., Azimioara, M. D., Benedetti, M. G., … Lockhart, D. J. (2005). A small molecule-kinase interaction map for clinical kinase inhibitors. Nature Biotechnology , 23(3), 329–336. doi: 10.1038/nbt1068
Fabregat, A., Jupe, S., Matthews, L., Sidiropoulos, K., Gillespie, M., Garapati, P., … D'Eustachio, P. (2018). The Reactome Pathway Knowledgebase. Nucleic Acids Research , 46(D1), D649–d655. doi: 10.1093/nar/gkx1132
Fernandez, N. F., Gundersen, G. W., Rahman, A., Grimes, M. L., Rikova, K., Hornbeck, P., & Ma'ayan, A. (2017). Clustergrammer, a web-based heatmap visualization and analysis tool for high-dimensional biological data. Scientific Data , 4(1), 170151. doi: 10.1038/sdata.2017.151
GTEx Consortium. (2020). The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science , 369(6509), 1318–1330. doi: 10.1126/science.aaz1776
Kanehisa, M., & Goto, S. (2000). KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Research , 28(1), 27–30. doi: 10.1093/nar/28.1.27
Koleti, A., Terryn, R., Stathias, V., Chung, C., Cooper, D. J., Turner, J. P., … Schürer, S. C. (2018). Data Portal for the Library of Integrated Network-based Cellular Signatures (LINCS) program: Integrated access to diverse large-scale cellular perturbation response data. Nucleic Acids Research , 46(D1), D558–D566. doi: 10.1093/nar/gkx1063
Kuleshov, M. V., Jones, M. R., Rouillard, A. D., Fernandez, N. F., Duan, Q., Wang, Z., … Ma'ayan, A. (2016). Enrichr: A comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Research , 44(W1), W90–97. doi: 10.1093/nar/gkw377
Kutmon, M., Riutta, A., Nunes, N., Hanspers, K., Willighagen, E. L., Bohler, A., … Pico, A. R. (2016). WikiPathways: Capturing the full diversity of pathway knowledge. Nucleic Acids Research , 44(D1), D488–494. doi: 10.1093/nar/gkv1024
Lamb, J., Crawford, E. D., Peck, D., Modell, J. W., Blat, I. C., Wrobel, M. J., … Golub, T. R. (2006). The Connectivity Map: Using gene-expression signatures to connect small molecules, genes, and disease. Science , 313(5795), 1929–1935. doi: 10.1126/science.1132939
Litichevskiy, L., Peckner, R., Abelin, J. G., Asiedu, J. K., Creech, A. L., Davis, J. F., … Jaffe, J. D. (2018). A library of phosphoproteomic and chromatin signatures for characterizing cellular responses to drug perturbations. Cell System , 6(4), 424–443.e427. doi: 10.1016/j.cels.2018.03.012
Niepel, M., Hafner, M., Duan, Q., Wang, Z., Paull, E. O., Chung, M., … Sorger, P. K. (2017). Common and cell-type specific responses to anti-cancer drugs revealed by high throughput transcript profiling. Nature Communications , 8(1), 1186. doi: 10.1038/s41467-017-01383-w
Niepel, M., Hafner, M., Mills, C. E., Subramanian, K., Williams, E. H., Chung, M., … Sorger, P. K. (2019). A multi-center study on the reproducibility of drug-response assays in mammalian cell lines. Cell System , 9(1), 35–48.e35. doi: 10.1016/j.cels.2019.06.005
Pilarczyk, M., Kouril, M., Shamsaei, B., Vasiliauskas, J., Niu, W., Mahi, N., … Medvedovic, M. (2020). Connecting omics signatures of diseases, drugs, and mechanisms of actions with iLINCS. bioRxiv , 826271. doi: 10.1101/826271
Rouillard, A. D., Gundersen, G. W., Fernandez, N. F., Wang, Z., Monteiro, C. D., McDermott, M. G., & Ma'ayan, A. (2016). The harmonizome: A collection of processed datasets gathered to serve and mine knowledge about genes and proteins. Database , 2016, baw100. doi: 10.1093/database/baw100
Sharma, V., Eckels, J., Schilling, B., Ludwig, C., Jaffe, J. D., MacCoss, M. J., & MacLean, B. (2018). Panorama Public: A Public Repository for Quantitative Data Sets Processed in Skyline. Molecular and Cell Proteomics , 17(6), 1239–1244. doi: 10.1074/mcp.RA117.000543
Stathias, V., Turner, J., Koleti, A., Vidovic, D., Cooper, D., Fazel-Najafabadi, M., … Schürer, S. C. (2019). LINCS Data Portal 2.0: Next generation access point for perturbation-response signatures. Nucleic Acids Research , 48(D1), D431–D439. doi: 10.1093/nar/gkz1023
Stathias, V., Turner, J., Koleti, A., Vidovic, D., Cooper, D., Fazel-Najafabadi, M., … Schürer, S. C. (2020). LINCS Data Portal 2.0: Next generation access point for perturbation-response signatures. Nucleic Acids Research , 48(D1), D431–D439. doi: 10.1093/nar/gkz1023
Subramanian, A., Narayan, R., Corsello, S. M., Peck, D. D., Natoli, T. E., Lu, X., … Golub, T. R. (2017). A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles. Cell , 171(6), 1437–1452.e1417. doi: 10.1016/j.cell.2017.10.049
Consortium, The Gene Ontology. (2019). The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Research , 47(D1), D330–D338. doi: 10.1093/nar/gky1055
Torre, D., Lachmann, A., & Ma'ayan, A. (2018). BioJupies: Automated generation of interactive notebooks for RNA-Seq data analysis in the cloud. Cell System , 7(5), 556–561.e553. doi: 10.1016/j.cels.2018.10.007
Wang, Z., Lachmann, A., Keenan, A. B., & Ma'ayan, A. (2018). L1000FWD: Fireworks visualization of drug-induced transcriptomic signatures. Bioinformatics , 34(12), 2150–2152. doi: 10.1093/bioinformatics/bty060
Wang, Z., Monteiro, C. D., Jagodnik, K. M., Fernandez, N. F., Gundersen, G. W., Rouillard, A. D., … Ma'ayan, A. (2016). Extraction and analysis of signatures from the Gene Expression Omnibus by the crowd. Nature Communications , 7, 12846–12846. doi: 10.1038/ncomms12846
Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., … Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data , 3(1), 160018. doi: 10.1038/sdata.2016.18

Internet Resources

NIH LINCS program website __

* <https://lincsproject.org/LINCS/>

The homepage for the LINCS Program. The overarching goals of the program are described here, as well as each of the DSGCs and the DCIC. Links to tools, publications, and data can also be found here.

Phase 1 L1000 data __

* <https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE92742>

The LINCS Phase 1 L1000 dataset, released in 2016, contains the first ∼1.3 million gene expression profiles generated by the L1000 platform. This dataset is available on GEO via accession GSE92742, under the “Supplementary Files” section, and includes downloadable data for each of the five levels of the L1000 data; Level 5 signatures are computed using the moderated z-score. Additionally, a Readme file and 10 metadata files are provided.

LINCS 2020 L1000 data __

* <https://clue.io/releases/data-dashboard>

The LINCS 2020 L1000 data, provides over 3 million gene expression profiles. Additional perturbagens and cell lines beyond those from Phase 1 have been added, including hematopoietic cell lines and non-cancer-related cell lines.

L1000 Profiles of GTEx data __

* <https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE92743>

As part of the process of validating and improving the L1000 assay, GTEx samples were profiled using both L1000 and RNA-seq assays. Both sets of data are available from GEO series GSE92743.

Notebooks providing L1000 data access __

* <https://github.com/cmap/lincs-workshop-2020>

The LINCS Center for Transcriptomics has created notebooks to provide various forms of access and interaction with the L1000 data. A cmapBQ tutorial notebook is provided, as is a cmapBQ Toolkit Demo notebook. Additionally, for gene expression analysis, the Compound Dose Response notebook and the Gene Modulation notebook are available. For cell fitness analysis, notebooks are provided for Exploration of Prism Cell Viability Data and for Cell Growth Rate and Impact on Viability Profiles. Documentation is provided for each notebook.

DToxS Center website __

* <https://martip03.u.hpc.mssm.edu/>

Information on the DToxS DSGC is provided at this site. All data, metadata, and signatures for DToxS data are also available with creation of an account. Metadata are available for cells, drugs, and assays. Twenty-eight standard operating procedures (SOPs) are also available in the categories of Cell SOPs, Assay SOPs, and Computational SOPs.

HMS LINCS database __

* <https://lincs.hms.harvard.edu/db/>

Data and metadata from HMS LINCS transcriptomics studies are hosted here, including data for KINOMEscan and KiNativ. Metadata are provided for all datasets, as well as cells, kinases, and small molecules.

The Drug-Pathway Browser __

* <https://lincs.hms.harvard.edu/explore/pathway/>

This tool provides an interactive network map of signal transduction pathways. Users can identify compounds that target a particular kinase or are associated with a phenotype of interest, or identify compounds having similar or synergistic effects via involvement in the same signal transduction cascade. Compound, protein, and cell lines are hyperlinked to their corresponding entries in the HMS LINCS Database.

The HMS LINCS Breast Cancer Browser __

* <http://www.cancerbrowser.org/>

Both published and unpublished datasets related to the biology of breast cancer and associated drug response are hosted here. Datasets include the Basal Receptor (RTK) Profile data, the Basal Total Protein Mass Spectrometry data, the Basal Phosphoprotein Mass Spectrometry dataset, the Growth Factor-Induced pAKT/pERK Response data, and the Drug Dose-Response Growth Rate Profiling dataset. The website also provides a Cell Line view for exploring each breast cancer cell line in depth, as well as a Drugs view containing information on drug development, function, and targets.

CLUE.io __

* <https://clue.io/>

The CLUE platform provides both a data portal for the L1000 data and a set of tools for working with the L1000 data. This platform requires the creation of a free account to log in and access its resources. Included are the CLUE Repurposing App, which permits users to explore drugs and tool compounds to identify drug repurposing opportunities to advance disease treatment; the CLUE Touchstone App, which provides interactive plots and permits users to explore connectivity among drug, loss-of-function, and gain-of-function signatures via interactive plots; the CLUE Morpheus App, which provides heat map analyses of data from the LINCS Center for Transcriptomics data and also allows for uploading and analyzing user data; and the CLUE Cell App, which permits the exploration of various cell lines and their annotations.

L1000FWD __

* <https://maayanlab.cloud/L1000FWD/>

The L1000FWD platform (Wang et al., 2018) provides an interactive scatter plot visualization of over 17,000 drug-induced transcriptomics signatures. The app also allows for viewing signatures by cell line or small molecule, as well as querying user-submitted signatures to find the top drugs to mimic or reverse a signature.

L1000CDS2 __

* <https://maayanlab.cloud/L1000CDS2/#/index>

The L1000CDS2 platform (Duan et al., 2016) takes user-submitted signatures or gene sets and returns consensus L1000 characteristic direction signatures that mimic or reverse the input signature.

iLINCS __

* <http://www.ilincs.org/ilincs/>

iLINCS (Pilarczyk et al., 2020) is a cloud-based platform that allows access to LINCS data as well as various workflows for processing and analyzing transcriptomics and proteomics data.

LINCS Data Portal v1 __

* <http://lincsportal.ccs.miami.edu/dcic-portal/>

The LINCS Data Portal v1 (Koleti et al., 2018) provides a data repository for earlier LINCS data releases prior to the 2020 data release. 422 LINCS datasets are accessible and available for download on the website, and can be queried by metadata, such as the small molecules, cells, genes, proteins, and antibodies studied. Assay and DSGC information are also provided for each dataset.

LINCS Data Portal v2 __

* <http://lincsportal.ccs.miami.edu/signatures/home>

The LINCS Data Portal v2 (Stathias et al., 2020) provides an updated user interface and enhanced metadata annotation to the LINCS Data Portal v1. Querying by signatures via the iLINCS platform (Pilarczyk et al., 2020), and by chemical structures are supported in addition to metadata queries.

SigCom LINCS __

* <https://maayanlab.cloud/sigcom-lincs>

SigCom LINCS is the latest LINCS data portal. It contains the most recent release of LINCS data. In addition to metadata and signature search functionalities, detailed and interactive visualizations of the LINCS data and metadata are provided. SigCom LINCS also contains signatures from other sources, gene pages, FAIR assessments of LINCS data, a downloads page, and OpenAPI programmatic access. SigCom LINCS search functionality includes single gene search, and the automatic conversion of any search term to gene sets and signatures.

Citing Literature

Number of times cited according to CrossRef: 2

Zhuorui Xie, Clara Chen, Avi Ma’ayan, Dex-Benchmark: datasets and code to evaluate algorithms for transcriptomics data analysis, PeerJ, 10.7717/peerj.16351, 11 , (e16351), (2023).
Yasha Hasija, Bioinformatics workflow management systems, All About Bioinformatics, 10.1016/B978-0-443-15250-4.00006-X, (247-265), (2023).

Genetic manipulation and immortalized culture of ex vivo primary human germinal center B cells

The cell-line-derived subcutaneous tumor model in preclinical cancer research

Subcellular patch-clamp techniques for single-bouton stimulation and simultaneous pre- and postsynaptic recording at cortical synapses

Viral crosslinking and solid-phase purification enables discovery of ribonucleoprotein complexes on incoming RNA virus genomes

Mitochondrial single-cell ATAC-seq for high-throughput multi-omic detection of mitochondrial genotypes and chromatin accessibility

References

Abstract
INTRODUCTION
Library of Integrated Network-based Cellular Signatures (LINCS)
Basic Protocol 1: NAVIGATING L1000 TOOLS AND DATA IN CLUE.io
Basic Protocol 2: COMPUTING SIGNATURES FROM THE L1000 DATA WITH THE CD METHOD
Basic Protocol 3: ANALYZING LISTS OF DIFFERENTIALLY EXPRESSED GENES AND QUERYING THEM AGAINST THE L1000 DATA WITH BioJupies and the BULK RNA-seq APPYTER
Basic Protocol 4: UTILIZING THE L1000FWD RESOURCE FOR DRUG DISCOVERY
Basic Protocol 5: KINOMEscan AND THE KINOMEscan APPYTER
Basic Protocol 6: LINCS PROTEOMICS: THE P100 AND GCP ASSAYS
Basic Protocol 7: THE LINCS JOINT PROJECTS (LJPs)
Basic Protocol 8: THE LINCS DATA PORTALS
Basic Protocol 9: CREATING AND ANALYZING SIGNATURES WITH iLINCS
COMMENTARY
Open Research
Literature Cited
Internet Resources
Citing Literature

Figure 1
The CLUE.io homepage.
Figure 2
Data Library choice from the tools drop-down menu that redirects to a page for LINCS L1000 data downloads.
Figure 3
A table of expandable LINCS projects and their associated datasets.
Figure 4
Page of supporting file download links for the CMAP LINCS 2020 dataset.
Figure 5
The CLUE Command application that allows querying for detailed information about compounds, genes, classes, connectivities, and other metadata in the L1000 data.
Figure 6
The Biojupies homepage.
Figure 7
Data submission page with choices of (A) submitting published data from GEO, (B) submitting published data from GTEx, (C) submitting a gene count matrix, or (D) submitting a FASTQ file.
Figure 8
Analysis page for selecting data analysis and visualization tools to be executed in the Jupyter Notebook, including (A) exploratory data analysis, (B) differential expression analysis, (C) enrichment analysis, and (D) small molecule querying sections.
Figure 9
Sample comparison page that allows for selection of control and perturbation samples for comparison.
Figure 10
Page for modifying options of the selected data analysis and visualization options.
Figure 11
Results page with options for opening the generated notebook and sharing it on social media.
Figure 12
Table of contents for the Jupyter notebook analysis, each of which can be clicked to navigate to the respective section.
Figure 13
Bulk RNA-seq Analysis Appyter homepage.
Figure 14
Bulk RNA-seq Analysis Appyter input form with section for uploading gene counts and metadata files displayed.
Figure 15
The “Normalization Methods” section of the input form with several options for normalizing the gene count matrix.
Figure 16
Input form section for selecting options to visualize differential gene expression computed by the Appyter notebook.
Figure 17
Input form section for selecting tools that compute and analyze gene expression signatures including gene set enrichment analysis and mimicker/reverser identification.
Figure 18
Executed Appyter notebook header with the table of contents and options for downloading the notebook, toggling code, and running the notebook locally highlighted.
Figure 19
PCA plot of genes with highest variance across samples. Each sample group is visualized using a different color.
Figure 20
Clustergrammer heatmap visualization of gene expression for each gene across all samples.
Figure 21
Library size analysis histogram for inspecting outlying samples based on total amount of mapped reads.
Figure 22
Volcano plot of significantly differentially expressed genes between sample groups, where red points represent up-regulated genes and blue points represent down-regulated genes.
Figure 23
Gene Ontology and Pathway enrichment analysis bar plots displaying the top enriched down-regulated and up-regulated terms based on the submitted differentially expressed gene set computed by the Appyter notebook.
Figure 24
L1000FWD fireworks visualization of signatures mimicking and reversing the differential gene expression signature generated from the input data.
Figure 25
L1000FWD homepage.
Figure 26
L1000FWD homepage query box populated with the cell line term “MCF7” and the available options for visualizing signatures profiled in the MCF7 cell line (left) and individual signature pages that were profiled in the MCF7 cell line (right).
Figure 27
Fireworks visualization of L1000 drug-induced gene expression signatures profiled in the MCF7 cell line.
Figure 28
Signature report page for a trichostatin-a signature profiled in the MCF7 cell line.
Figure 29
Fireworks visualization of L1000 drug-induced gene expression signatures profiled in all available cell lines.
Figure 30
Fireworks visualization where the signature points are shaped by p-value and colored by cell line.
Figure 31
Fireworks visualization with dexamethasone signatures highlighted.
Figure 32
Fireworks visualization with drug-induced signatures that reverse or mimic the input signature highlighted.
Figure 33
Drop-down menu of available cell lines to filter the fireworks visualization.
Figure 34
Search page for querying drug names to retrieve drug pages with identifying metadata for the drug and its signatures.
Figure 35
Signature report page for inputting small molecules, cell lines, and time points of interest.
Figure 36
Interactive visualization of the signatures that match the search criteria input in the signature report page.
Figure 37
L1000FWD downloads page.
Figure 38
HMS LINCS KINOMEscan data homepage.
Figure 39
Detailed metadata for the (s)-CR8 KINOMEscan dataset on the HMS LINCS Database. The dataset ID, 20342, is visible at the top of the table.
Figure 40
Metadata on all small molecules studied in the (s)-CR8 KINOMEscan dataset (ID:20342) on the HMS LINCS Database. Only the small molecule (s)-CR8 was studied in this dataset.
Figure 41
Metadata on all panel kinases profiled in the (s)-CR8 KINOMEscan dataset (ID:20342) on the HMS LINCS Database.
Figure 42
Metadata for each column of the data table provided in the “Results” tab of the (s)-CR8 KINOMEscan dataset (ID:20342) on the HMS LINCS Database.
Figure 43
The actual results of the KINOMEscan assay for (s)-CR8 provided as a data table in the (s)-CR8 KINOMEscan dataset (ID:20342) on the HMS LINCS Database. Each row represents a single kinase.
Figure 44
Input form of the KINOMEscan Data Visualization Appyter. Default options are already filled in.
Figure 45
Query results for the kinase ABL2 in the KINOMEscan Data Visualization Appyter. The tables provide the top-ranked small molecules that bind with ABL2 based on either % control or Kd value, while the bar charts show the distribution of % control and Kd values for all small molecules that bind ABL2.
Figure 46
Query results for the small molecule AC220 in the KINOMEscan Data Visualization Appyter. The table displays the top-ranked kinases bound by AC220 based on Kd, while the bar chart shows the distribution of Kd values for all kinases bound by AC220.
Figure 47
The results of querying the example kinase list provided on the KINOMEscan Data Visualization Appyter. The tables show the top five drugs that bind kinases in the input list by average % control and Kd.
Figure 48
Query results for the example gene list provided on the KINOMEscan Data Visualization Appyter. The tables show the top five drugs, based on average % control and Kd, that bind kinases coded by the input genes.
Figure 49
The LINCS Panorama data repository homepage.
Figure 50
The “Quick Links” section of the LINCS Panorama homepage.
Figure 51
The LINCS sub-menu open on the Panorama homepage. Both LINCS P100 and GCP datasets can be accessed here.
Figure 52
The table containing all LINCS P100 data in Panorama.
Figure 53
The table containing all LINCS GCP data in Panorama.
Figure 54
Example Level 1 Skyline data page for a LINCS P100 dataset on Panorama.
Figure 55
Example Level 4 heatmap created with Morpheus. Columns are drugs while rows correspond to genes.
Figure 56
Example Level 4 heatmap created with Morpheus. Columns are drugs while rows correspond to genes.
Figure 57
List of all Skyline files for targeted mass spectrometry datasets on Panorama. Metadata on each dataset is included as well as download links.
Figure 58
The Mass Spec search box on Panorama, which allows for searching specific proteins, peptides, or modifications among all mass spectrometry datasets.
Figure 59
Example of filtering by column on a Morpheus heatmap. The small molecule CC-401 is entered into the search bar, and the corresponding columns are highlighted.
Figure 60
Example of filtering by column on a Morpheus heatmap. The small molecule CC-401 is entered into the search bar, and the corresponding columns are highlighted.
Figure 61
Display options tab in the Morpheus heatmap options. Use the Annotations tab to change row or column labels, the Color Scheme tab to change the heatmap colors, and the Display tab to format the heatmap layout.
Figure 62
GR Browser homepage. By default, the Broad-HMS LINCS Joint Project is selected on the left, and the corresponding dose-response grids are displayed.
Figure 63
Hovering over “Hs-578T” in the Cell_Line box will highlight the dose-response curves corresponding to Hs-578T cells and the respective small molecule for each grid in the GR Browser.
Figure 64
Hovering over “Afatinib” in the Small_Molecule box will highlight the dose-response curves corresponding to afatinib treatment of the respective cells for each grid in the GR Browser.
Figure 65
An example boxplot from GR Browser displaying the distribution of GR50 values for the first nine small molecules in the dataset by alphabetical order.
Figure 66
The GR Browser Data Table, which provides GR metrics and metadata for each small molecule perturbation.
Figure 67
Example of filtering the Data Table in the GR Browser. AZD has been entered into the search box, while BT-20 has been entered into the cell_line column filter, and all entries corresponding to the cell line BT-20 that contain “AZD” are now displayed.
Figure 68
The list of filtered LINCS Joint Project datasets on the HMS LINCS Database.
Figure 69
The dataset page for LINCS Joint Project Dataset ID:20259 on the HMS LINCS Database. By default, the Details tab is shown, which contains metadata on the dataset and assay.
Figure 70
The Small Molecules Studied tab for one of the LINCS Joint Project datasets, ID:20259 on the HMS LINCS Database. Metadata for all small molecules studied in this dataset are shown.
Figure 71
The metadata page for neratinib on the HMS LINCS Database.
Figure 72
The Cell Lines Studied tab for one of the LINCS Joint Project datasets, ID:20259 on the HMS LINCS Database. Only one type of cell, BT-20, was used in this dataset.
Figure 73
The metadata page for the cell line BT-20 on the HMS LINCS Database.
Figure 74
The Data Columns tab for one of the LINCS Joint Project datasets, ID:20259 on the HMS LINCS Database. Each row provides metadata for a column in the results table of the dataset.
Figure 75
The results contained in LINCS Joint Project dataset ID:20259. Each row represents a drug combination treatment of BT-20 cells. Rows with blank entries are controls.
Figure 76
The LINCS Joint Project Breast Cancer Network Browser homepage. Data from the LINCS Joint Projects are visualized on the scatterplot. The shape, size, and color of points can be adjusted, and are labeled in the legend on the left. Each point represents a single gene signature.
Figure 77
LINCS Data Portal Version 1 homepage.
Figure 78
Statistics for the datasets, small molecules, cells, and genes associated with the L1000 assay.
Figure 79
Table of datasets that were generated using the L1000 mRNA profiling assay.
Figure 80
Table of datasets that were generated using the L1000 mRNA profiling assay.
Figure 81
Dataset-specific page for CRISPR Perturbagens with identifying metadata for the dataset.
Figure 82
Dataset-specific page for CRISPR Perturbagens with identifying metadata for the dataset.
Figure 83
List view of small molecules that were profiled in the L1000 assay and associated metadata.
Figure 84
List view of small molecules that were profiled in the L1000 assay and associated metadata.
Figure 85
Expanded view of small molecule metadata that includes the cell lines that molecule was profiled in, as well as datasets that the small molecule was included in.
Figure 86
Expanded view of small molecule metadata that includes the cell lines that molecule was profiled in, as well as datasets that the small molecule was included in.
Figure 87
List view of cell lines that were profiled in the L1000 assay along with informative metadata.
Figure 88
List view of cell lines that were profiled in the L1000 assay along with informative metadata.
Figure 89
Harmonizome search page for genes included in the L1000 datasets.
Figure 90
Harmonizome search page for genes included in the L1000 datasets.
Figure 91
Single gene landing page for MAP10 that includes identifying metadata for the gene, as well as functional associations for the gene.
Figure 92
LINCS Data Portal Version 2 homepage.
Figure 93
Metadata search bar with suggestions for perturbations, model systems, and signatures associated with “A375”.
Figure 94
Signature search page with up-regulated and down-regulated gene sets input into the respective search boxes.
Figure 95
Table of results with highly similar and dissimilar signatures to the input signature sorted by absolute similarity values. Each row includes metadata for the signature.
Figure 96
Table of assays used to generate the datasets available in the portal.
Figure 97
Table of small molecule perturbations and their metadata.
Figure 98
Table of genes knocked down by sgRNA to observe the effect on gene expression, and their metadata.
Figure 99
Table of model systems profiled in L1000 signatures.
Figure 100
Table of all available signatures and their metadata.
Figure 101
SigCom LINCS homepage.
Figure 102
Signature search page with input boxes populated with up-regulated and down-regulated genes.
Figure 103
Top signature results that mimic or reverse the input signature displayed by dataset.
Figure 104
Bar chart visualization and tables of L1000 Chemical Perturbation signatures mimicking or reversing the input signature.
Figure 105
Clustergram visualization of top signatures where input genes were up-regulated and down-regulated, respectively.
Figure 106
Metadata search page filtered by signatures that include the keyword “dexamethasone”.
Figure 107
Signatures containing the keyword “dexamethasone”, filtered to only signatures generated by the LINCS Transcriptomics DSGC using the menu on the right.
Figure 108
Drop-down menu of actions that can be performed on a signature of interest, including performing a signature search, downloading the signature, and performing gene set enrichment analysis.
Figure 109
Signature metadata page for the signature “CPC006_MCF7_24H_O03_dexamethasone_10uM”.
Figure 110
Significantly up-regulated genes for a signature of interest.
Figure 111
Metadata search page for datasets, filtered by datasets that were created using the L1000 platform.
Figure 112
Dataset metadata page for Gene Knockdowns from LINCS Transcriptomics that also includes signatures generated within the dataset.
Figure 113
Metadata search page for genes, filtered by the gene symbol “ACE2”.
Figure 114
Gene page for ACE2 that includes signatures in which ACE2 is significantly up-regulated or down-regulated.
Figure 115
Page of UMAPs generated for each dataset included in SigCom LINCS.
Figure 116
Static UMAP plot of Normalized L1000 signatures colored by perturbation type.
Figure 117
Interactive UMAP plot of Automatic Human GEO RNA-seq Signatures with a signature of interest moused over and highlighted.
Figure 118
iLINCS data portal homepage.
Figure 119
Search results page for datasets, signatures, compounds, and genes that match the query term “everolimus”.
Figure 120
Expanded signature results that displays a table of signatures that include the term “everolimus”. Each of the table columns can be filtered with keywords of interest.
Figure 121
Signature-specific page with details about the signature and various options for analyzing the signature.
Figure 122
Volcano plot of differentially expressed genes within the signature. The top 100 genes are selected by default and the sliders allow for modifying the differential expression range and p-value cut-off.
Figure 123
SPIA Functional Pathway Analysis table of the differentially expressed genes from the signature displaying the top enriched KEGG pathways.
Figure 124
Table of genes included in the signature, their differential expression value, and p-value.
Figure 125
Table of top 100 differentially expressed genes in the signature that displays each differential expression value and p-value.
Figure 126
Tab containing various drop-down tables that contain signatures connected to the input signature based on Pearson correlation coefficient concordance.
Figure 127
Drop-down table of top LINCS Chemical Perturbagen signatures connected to the input signature ranked by Pearson correlation coefficient concordance.
Figure 128
Drop-down menu for selecting the number of top correlated signatures for consideration in group analysis.
Figure 129
Drop-down menu of grouped signature analysis options.
Figure 130
Table of signatures for consideration in group analysis that can be selected/deselected.
Figure 131
Downloadable table of selected signatures for group analysis. At the bottom of the page are pre-computed signature group analyses that can be viewed by clicking on the respective icon.
Figure 132
Morpheus signatures heatmap page with signatures clustered by perturbagens, cell lines, etc.
Figure 133
Drop-down table of top perturbation signatures connected to the input signature.
Figure 134
Table of LINCS gene knockdown signatures related to the query signature ranked by z-score. The table includes signature metadata like the target gene of the knockdown and associated pathways, in addition to the direction of the correlation with the query signature, p-value, and FDR.
Figure 135
Table of LINCS chemical perturbagen signatures related to the query signature ranked by z-score. The table includes perturbagen metadata, in addition to the direction of the correlation with the query signature, p-value, and FDR.
Figure 136
Datasets workflow page with over 15,000 datasets of pre-processed signatures available for analysis.
Figure 137
Filtered dataset page displaying TCGA datasets that also include the search term “breast”.
Figure 138
Page for exploring and analyzing a dataset of interest, in this case the “TCGA_BRCA_RPPA_2019” dataset from TCGA. The page includes options for creating a signature, multi-group analysis, and options for analyzing a list of genes, along with tabs for exploratory visualizations and data/metadata associated with the dataset.
Figure 139
Signature creation page for the “TCGA_BRCA_RPPA_2019” dataset where grouping variables for creating the signature can be selected from drop-down menus.
Figure 140
Signature details page for the generated signature from the “TCGA_BRCA_RPPA_2019” dataset.

Barretina, J., Caponigro, G., Stransky, N., Venkatesan, K., Margolin, A. A., Kim, S., … Garraway, L. A. (2012). The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature, 483(7391), 603–607. doi: 10.1038/nature11003 10.1038/nature11003 CASPubMedWeb of Science®Google Scholar
Charbonneau, A. L., Brady, A., Czajkowski, K., Aluvathingal, J., Canchi, S., Carter, R., … White, O. (2022). Making Common Fund data more findable: Catalyzing a Data Ecosystem. bioRxiv, doi: 10.1101/2021.11.05.467504 10.1101/2021.11.05.467504 Google Scholar
Clark, N. A., Hafner, M., Kouril, M., Williams, E. H., Muhlich, J. L., Pilarczyk, M., … Medvedovic, M. (2017). GRcalculator: An online tool for calculating and mining dose–response data. BMC Cancer, 17(1), 698. doi: 10.1186/s12885-017-3689-3 10.1186/s12885-017-3689-3 PubMedWeb of Science®Google Scholar
Clark, N. R., Hu, K. S., Feldmann, A. S., Kou, Y., Chen, E. Y., Duan, Q., & Ma'ayan, A. (2014). The characteristic direction: A geometrical approach to identify differentially expressed genes. BMC Bioinformatics, 15(1), 79. doi: 10.1186/1471-2105-15-79 10.1186/1471-2105-15-79 PubMedWeb of Science®Google Scholar
Clarke, D. J. B., Jeon, M., Stein, D. J., Moiseyev, N., Kropiwnicki, E., Dai, C., … Ma'ayan, A. (2021). Appyters: Turning Jupyter Notebooks into data-driven web apps. Patterns, 2(3), 100213–100213. doi: 10.1016/j.patter.2021.100213 10.1016/j.patter.2021.100213 Google Scholar
Clarke, D. J. B., Kuleshov, M. V., Schilder, B. M., Torre, D., Duffy, M. E., Keenan, A. B., … Ma'ayan, A. (2018). eXpression2Kinases (X2K) Web: Linking expression signatures to upstream cell signaling networks. Nucleic Acids Research, 46(W1), W171–W179. doi: 10.1093/nar/gky458 10.1093/nar/gky458 CASPubMedWeb of Science®Google Scholar
Duan, Q., Reid, S. P., Clark, N. R., Wang, Z., Fernandez, N. F., Rouillard, A. D., … Ma'ayan, A. (2016). L1000CDS2: LINCS L1000 characteristic direction signatures search engine. NPJ Systems Biology and Applications, 2(1), 16015. doi: 10.1038/npjsba.2016.15 10.1038/npjsba.2016.15 CASPubMedWeb of Science®Google Scholar
Edgar, R., Domrachev, M., & Lash, A. E. (2002). Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Research, 30(1), 207–210. doi: 10.1093/nar/30.1.207 10.1093/nar/30.1.207 CASPubMedWeb of Science®Google Scholar
Enache, O. M., Lahr, D. L., Natoli, T. E., Litichevskiy, L., Wadden, D., Flynn, C., … Subramanian, A. (2019). The GCTx format and cmap{Py, R, M, J} packages: Resources for optimized storage and integrated traversal of annotated dense matrices. Bioinformatics, 35(8), 1427–1429. doi: 10.1093/bioinformatics/bty784 10.1093/bioinformatics/bty784 CASPubMedWeb of Science®Google Scholar
Evangelista, J. E., Clarke, D. J. B., Xie, Z., Lachmann, A., Jeon, M., Chen, K., … Ma'ayan, A. (2022). SigCom LINCS: Data and metadata search engine for a million gene expression signatures. Nucleic Acids Research, gkac328. doi: 10.1093/nar/gkac328 10.1093/nar/gkac328 PubMedWeb of Science®Google Scholar
Fabian, M. A., Biggs, W. H. 3rd, Treiber, D. K., Atteridge, C. E., Azimioara, M. D., Benedetti, M. G., … Lockhart, D. J. (2005). A small molecule-kinase interaction map for clinical kinase inhibitors. Nature Biotechnology, 23(3), 329–336. doi: 10.1038/nbt1068 10.1038/nbt1068 CASPubMedWeb of Science®Google Scholar
Fabregat, A., Jupe, S., Matthews, L., Sidiropoulos, K., Gillespie, M., Garapati, P., … D'Eustachio, P. (2018). The Reactome Pathway Knowledgebase. Nucleic Acids Research, 46(D1), D649–d655. doi: 10.1093/nar/gkx1132 10.1093/nar/gkx1132 CASPubMedWeb of Science®Google Scholar
Fernandez, N. F., Gundersen, G. W., Rahman, A., Grimes, M. L., Rikova, K., Hornbeck, P., & Ma'ayan, A. (2017). Clustergrammer, a web-based heatmap visualization and analysis tool for high-dimensional biological data. Scientific Data, 4(1), 170151. doi: 10.1038/sdata.2017.151 10.1038/sdata.2017.151 PubMedWeb of Science®Google Scholar
GTEx Consortium. (2020). The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science, 369(6509), 1318–1330. doi: 10.1126/science.aaz1776 10.1126/science.aaz1776 PubMedWeb of Science®Google Scholar
Kanehisa, M., & Goto, S. (2000). KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Research, 28(1), 27–30. doi: 10.1093/nar/28.1.27 10.1093/nar/28.1.27 CASPubMedWeb of Science®Google Scholar
Koleti, A., Terryn, R., Stathias, V., Chung, C., Cooper, D. J., Turner, J. P., … Schürer, S. C. (2018). Data Portal for the Library of Integrated Network-based Cellular Signatures (LINCS) program: Integrated access to diverse large-scale cellular perturbation response data. Nucleic Acids Research, 46(D1), D558–D566. doi: 10.1093/nar/gkx1063 10.1093/nar/gkx1063 CASPubMedWeb of Science®Google Scholar
Kuleshov, M. V., Jones, M. R., Rouillard, A. D., Fernandez, N. F., Duan, Q., Wang, Z., … Ma'ayan, A. (2016). Enrichr: A comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Research, 44(W1), W90–97. doi: 10.1093/nar/gkw377 10.1093/nar/gkw377 CASPubMedWeb of Science®Google Scholar
Kutmon, M., Riutta, A., Nunes, N., Hanspers, K., Willighagen, E. L., Bohler, A., … Pico, A. R. (2016). WikiPathways: Capturing the full diversity of pathway knowledge. Nucleic Acids Research, 44(D1), D488–494. doi: 10.1093/nar/gkv1024 10.1093/nar/gkv1024 CASPubMedWeb of Science®Google Scholar
Lamb, J., Crawford, E. D., Peck, D., Modell, J. W., Blat, I. C., Wrobel, M. J., … Golub, T. R. (2006). The Connectivity Map: Using gene-expression signatures to connect small molecules, genes, and disease. Science, 313(5795), 1929–1935. doi: 10.1126/science.1132939 10.1126/science.1132939 CASPubMedWeb of Science®Google Scholar
Litichevskiy, L., Peckner, R., Abelin, J. G., Asiedu, J. K., Creech, A. L., Davis, J. F., … Jaffe, J. D. (2018). A library of phosphoproteomic and chromatin signatures for characterizing cellular responses to drug perturbations. Cell System, 6(4), 424–443.e427. doi: 10.1016/j.cels.2018.03.012 10.1016/j.cels.2018.03.012 CASPubMedWeb of Science®Google Scholar
Niepel, M., Hafner, M., Duan, Q., Wang, Z., Paull, E. O., Chung, M., … Sorger, P. K. (2017). Common and cell-type specific responses to anti-cancer drugs revealed by high throughput transcript profiling. Nature Communications, 8(1), 1186. doi: 10.1038/s41467-017-01383-w 10.1038/s41467-017-01383-w PubMedWeb of Science®Google Scholar
Niepel, M., Hafner, M., Mills, C. E., Subramanian, K., Williams, E. H., Chung, M., … Sorger, P. K. (2019). A multi-center study on the reproducibility of drug-response assays in mammalian cell lines. Cell System, 9(1), 35–48.e35. doi: 10.1016/j.cels.2019.06.005 10.1016/j.cels.2019.06.005 CASPubMedWeb of Science®Google Scholar
Pilarczyk, M., Kouril, M., Shamsaei, B., Vasiliauskas, J., Niu, W., Mahi, N., … Medvedovic, M. (2020). Connecting omics signatures of diseases, drugs, and mechanisms of actions with iLINCS. bioRxiv, 826271. doi: 10.1101/826271 10.1101/826271 Google Scholar
Rouillard, A. D., Gundersen, G. W., Fernandez, N. F., Wang, Z., Monteiro, C. D., McDermott, M. G., & Ma'ayan, A. (2016). The harmonizome: A collection of processed datasets gathered to serve and mine knowledge about genes and proteins. Database, 2016, baw100. doi: 10.1093/database/baw100 10.1093/database/baw100 PubMedGoogle Scholar
Sharma, V., Eckels, J., Schilling, B., Ludwig, C., Jaffe, J. D., MacCoss, M. J., & MacLean, B. (2018). Panorama Public: A Public Repository for Quantitative Data Sets Processed in Skyline. Molecular and Cell Proteomics, 17(6), 1239–1244. doi: 10.1074/mcp.RA117.000543 10.1074/mcp.RA117.000543 CASPubMedGoogle Scholar
Stathias, V., Turner, J., Koleti, A., Vidovic, D., Cooper, D., Fazel-Najafabadi, M., … Schürer, S. C. (2019). LINCS Data Portal 2.0: Next generation access point for perturbation-response signatures. Nucleic Acids Research, 48(D1), D431–D439. doi: 10.1093/nar/gkz1023 10.1093/nar/gkz1023 Web of Science®Google Scholar
Stathias, V., Turner, J., Koleti, A., Vidovic, D., Cooper, D., Fazel-Najafabadi, M., … Schürer, S. C. (2020). LINCS Data Portal 2.0: Next generation access point for perturbation-response signatures. Nucleic Acids Research, 48(D1), D431–D439. doi: 10.1093/nar/gkz1023 10.1093/nar/gkz1023 CASPubMedWeb of Science®Google Scholar
Subramanian, A., Narayan, R., Corsello, S. M., Peck, D. D., Natoli, T. E., Lu, X., … Golub, T. R. (2017). A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles. Cell, 171(6), 1437–1452.e1417. doi: 10.1016/j.cell.2017.10.049 10.1016/j.cell.2017.10.049 CASPubMedWeb of Science®Google Scholar
Consortium, The Gene Ontology. (2019). The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Research, 47(D1), D330–D338. doi: 10.1093/nar/gky1055 10.1093/nar/gky1055 PubMedWeb of Science®Google Scholar
Torre, D., Lachmann, A., & Ma'ayan, A. (2018). BioJupies: Automated generation of interactive notebooks for RNA-Seq data analysis in the cloud. Cell System, 7(5), 556–561.e553. doi: 10.1016/j.cels.2018.10.007 10.1016/j.cels.2018.10.007 CASPubMedWeb of Science®Google Scholar
Wang, Z., Lachmann, A., Keenan, A. B., & Ma'ayan, A. (2018). L1000FWD: Fireworks visualization of drug-induced transcriptomic signatures. Bioinformatics, 34(12), 2150–2152. doi: 10.1093/bioinformatics/bty060 10.1093/bioinformatics/bty060 CASPubMedWeb of Science®Google Scholar
Wang, Z., Monteiro, C. D., Jagodnik, K. M., Fernandez, N. F., Gundersen, G. W., Rouillard, A. D., … Ma'ayan, A. (2016). Extraction and analysis of signatures from the Gene Expression Omnibus by the crowd. Nature Communications, 7, 12846–12846. doi: 10.1038/ncomms12846 10.1038/ncomms12846 CASPubMedWeb of Science®Google Scholar
Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., … Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3(1), 160018. doi: 10.1038/sdata.2016.18 10.1038/sdata.2016.18 PubMedWeb of Science®Google Scholar