How to use the CRISPR Gene Scoring Tool

Return to the CRISPR Gene Scoring Tool.
Download: CRISPR Screen Analysis Tool README.

Jump to section:

Tool Overview

This analysis tool can be used to rank genes for genetic perturbation screens. The tool takes a list of perturbations and associated numerical scores as input and computes a score using one of two statistical methods: negative binomial distribution (with replacement) or hypergeometric distribution (without replacement). These methods and their corresponding parameters are described in more detail directly below.

User Guide

The following section is intended as a tutorial for new users and includes sample inputs, outputs and running parameters for both kinds of analysis. For more detail on how the analysis works as well as the general formats of the inputs and outputs, see the related sections below.

After downloading the files listed above, perform the following steps and choose the following parameters for the given analysis in order to get matching output and ensure that you are using the tool properly. Note: the parameters listed are default or standard given the examples provided though you may choose to run with different parameters depending on the nature of your own experimental data.

Data and Annotation File Input

These input steps are common to either type of analysis used.

Negative Binomial Distribution (STARS): Instructions and Parameters

Hypergeometric Distribution Tool: Instructions and Parameters

Statistical Methods

There are two statistical methods to choose from for your analysis. Their descriptions are as follows:

Negative Binomial Distribution (STARS)
The STARS score is calculated using the probability mass function of a binomial distribution. The calculation is performed for all perturbations that rank above a user-defined threshold, e.g. the top x% of perturbations from a ranked list. The value of the least probable perturbation for each gene is then assigned to the gene as the STARS score. Unless specified, STARS requires that at least two perturbations rank above the user-defined threshold for a gene to receive a STARS score. Permutation testing is also performed on the list of perturbations used in the experiment to generate a null distribution, allowing the calculation of p-values and false discovery rates (FDR) for hit genes. STARS also provides separate outputs for sgRNAs ranked in ascending and descending direction.
Hypergeometric Distribution
In this method, the rank of sgRNAs is used to calculate gene p-values using the probability mass function of a hypergeometric distribution. The list of sgRNAs can be ranked in both ascending and descending directions and the resulting p-values will be different in each direction. We choose to resolve this by calculating the average -log10(p-value) in both directions and picking the more significant one. The top n% of sgRNAs per gene can be used to calculate the average p-value with this method. The average log-fold change per gene is also reported and this can be used to assess the magnitude of effect.

Running Parameters

Input Formats

Chip File
A .txt file with the first column listing the individual sgRNAs and the second column listing the gene identifiers of the sgRNA targets.
Data File
A .txt file with the first column listing the sgRNAs as specified in the first column of the chip file provided and the consecutive columns listing the numerical inputs for each condition.

Output File Details

Notes on target matching in CRISPR chip files and output files

In addition to the rows indicating target (gene) matches, a CRISPR chip file may also contain negative, or "non-target", information about a construct. There are two broad types of non-target indicators:

  1. Non-Target designations discovered via genome search
  2. Non-Target designations intrinsic to the construct's own sequence

Numeric Counter Suffixes on Non-Target Codes

You will notice that the non-target codes mentioned above do not appear in chip files in their "bare" form. Instead you will find e.g. "NO_SITE_192" or "INACTIVE_6T+_32". The reason for these appended "counter" digits is simply to ensure uniqueness so that e.g. the top hit in your screen doesn't end up being "NO_SITE". The actual numeric suffix value is not significant, nor is it stable over time. That is, today a barcode may be associated with "ONE_INTERGENIC_SITE_120" and tomorrow, after an updated run of the chip file generator, the same barcode may instead be given the code "ONE_INTERGENIC_SITE_119", due to a change somewhere higher up in the file.

Notes on Output File columns

This tool generates separate output files for every column in your input file. The column name will be included in the output file name.

Negative Binomial Distribution (STARS)
Only the genes with at least 2 perturbations ranking above the threshold will receive a STARS score and be reported in the output file. If the first perturbation was used to calculate the STARS score, all the genes with at least one perturbation ranking above the specified threshold will receive a STARS score and be reported in the output file. The output file contains 10 columns as follows:
  1. Gene identifier, from column 2 of the chip file
  2. Number of perturbations targeting the gene
  3. Ranks of perturbations targeting the gene
  4. Identity of perturbations
  5. Within-gene-rank of the least probable perturbation
  6. STARS score: -log10(value of least perturbation)
  7. Average score: Average of negative log of the values of all perturbations ranking above threshold
  8. P-values calculated using the null distribution specified
  9. False Discovery Rate (FDR) calculated using permutation testing
  10. q-value
Hypergeometric distribution
All the genes in the library will be reported in the output file along with a .pdf of the volcano plot. The output file contains 10 columns as follows:
  1. Gene identifier, from column 2 of the chip file
  2. Average log-fold change of n% guides per gene
  3. Average -log10(p-value) of n% guides per gene
  4. Number of perturbations targeting the gene
  5. Identity of perturbations; perturbations listed according to individual rankings in ascending order
  6. Individual log-fold changes of the perturbations
  7. Ranks of the individual perturbations in the ascending direction
  8. -log10(p-values) of the individual perturbations in the ascending direction
  9. Ranks of the individual perturbations in the descending direction
  10. -log10(p-values) of the individual perturbations in the descending direction

GPP Web Portal Terms of Service

Effective Date: December 8, 2025
By using this site, you agree to our terms and conditions below.

Overview of Terms

The data made available on this website were generated for research purposes and are not intended for clinical or commercial uses. Commercial use (or other use for profit-making purposes) of the GPP Web Portal and its tools, is not permitted under these terms and may require a separate license agreement from Broad or its contributors. For more information, please contact partnering@broadinstitute.org.

The original data may be subject to rights claimed by third parties, including but not limited to, patent, copyright, other intellectual property rights, biodiversity-related access and benefit-sharing rights. It is the responsibility of users of Broad Institute services to ensure that their use of the data does not infringe any of the rights of such third parties.

Any questions or comments concerning these Terms of Use can be addressed to: legal@broadinstitute.org.

By accessing and viewing this GPP Web Portal, you agree to the following terms and conditions:

Attribution

You agree to acknowledge the Broad Institute (e.g., in publications, services or products) for any of your use of its online services, databases or software in accordance with good scientific practice. You agree to use the acknowledgment wording provided for the relevant tools as indicated on the FAQ for each tool.

Updating the Terms of Use

We reserve the right to update these Terms of Use at any time. When alterations are inevitable, we will attempt to give reasonable notice of any changes by placing a notice on our website, but you may wish to check each time you use the website. The date of the most recent revision will appear on this, the "GPP Web Portal Terms of Use" page. If you do not agree to these changes, please do not continue to use our online services. We will also make available an archived copy of the previous Terms of Use for comparison.

Indemnification and Disclaimer of Warranties

You are using this GPP Web Portal at your own risk, and you hereby agree to hold Broad and its contributors and their trustees, directors, officers, employees, and affiliated investigators harmless for any third party claims which may arise from your use of the GPP Web Portal, the tools available therein, or any portion thereof. Further, you agree to indemnify Broad, its contributors, and its and their trustees, directors, officers, employees, affiliated investigators, students, and affiliates for any loss, costs, claims, damages, or other liabilities arising from any unpermitted commercial or profit-making use you make of the GPP Web Portal. The GPP Web Portal is a research tool and is provided "as is". Broad does not represent that the GPP Web Portal is free of errors or bugs or suitable for any particular tasks.

ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NONINFRINGEMENT, OR THE ABSENCE OF LATENT OR OTHER DEFECTS ARE DISCLAIMED. IN NO EVENT SHALL BROAD OR ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THE GPP WEB PORTAL, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Governing Law

The terms and conditions herein shall be construed, governed, interpreted, and applied in accordance with the internal laws of the Commonwealth of Massachusetts, U.S.A. Furthermore, by accessing, downloading, or using the Database, You consent to the personal jurisdiction of, and venue in, the state and federal courts within Massachusetts with respect to Your download or use of the Database.