How the sgRNA Designer Works (all versions)

Jump to section:

Overview

This tool ranks and picks candidate sgRNA sequences for the targets provided, while attempting to maximize on-target activity and minimizing off-target activity. It uses the "Rule Set 2" scoring model from Doench, Fusi et al., Nature Biotechnology 2016 (developed in conjunction with the Azimuth project at Microsoft Research) to assess sgRNA on-target activity, and the CFD (Cutting Frequency Determination) score to evaluate off-target sites.

Note: November 4, 2016: We have now updated to Microsoft's latest on-target scoring model, Azimuth 2.0 (see link for detailed list of changes). The bug fixes in the new implementation do not affect the overall performance of the model, though the individual numeric scores do vary slightly.

Target Resolution

In this initial phase, we look up each input gene or transcript identifier in an attempt to match with a known entity in our database (current NCBI Gene and RefSeq catalogs). If this is successful, we then retrieve any sequence and genomic locus information we have available for this target. Since the quality of the source annotations varies, we may not always have perfect data on how a transcript maps to genomic loci, or even its exonic structure. In the worst case, the tool infers a putative genomic sequence from the known RNA sequence of a transcript, and if exon boundary locations are not known, then there is a good chance that a very small number of candidate sgRNA sequences will be biologically nonsensical (i.e. corresponding to a sequence that spans an exon-exon boundary and thus discontiguous in the genomic DNA).

sgRNA Candidate Sequence Generation

Once the target has been successfully resolved and sequence information gathered, the tool cycles through the sequence looking for appropriate PAM sites along both strands, generating an initial list of "candidate" sgRNA target sequences.

sgRNA Candidate Sequence Annotation and Ranking

The candidate sequences must be annotated and ranked in order to prioritize the picking process. First we calculate two independent dimensions: On-Target Rank and Off-Target Rank. The on-target and off-target ranks of each sgRNA are then combined at equal weight to provide a final rank for each sgRNA targeting a particular transcript.

On-Target Efficacy Scoring (Azimuth 2.0)

We use the Azimuth 2.0 model to calculate the on-target score for each candidate sgRNA target sequence, and use these scores to assign a per-transcript On-Target Ranking. Our implementation uses the version of Azimuth 2.0 that does not incorporate protein target site information, as that criterion is used later as a relaxable constraint during the "picking" phase.

For detailed information about Rule Set 2 scoring methods please refer to Doench, Fusi et al., Nature Biotechnology 2016.

Off-Target Analysis ("Threat Matrix")

We annotate each candidate sgRNA sequence by the number of potential off-target sites along two dimensions:

(1a) CRISPRko: Match Tiers ("Tiers I - IV" in the output file):
(1a) CRISPRa/i: Match Tiers ("Tiers I - III" in the output file):
(2) CFD scores ("Match Bins I - IV" in the output file):

Combining the Tier dimension with the CFD Match Bin dimension yields an off-target "Threat Matrix" (4 x 4 for CRISPRko, 3 x 4 for CRISPRa/i), presented as 16 or 12 columns in the output file. The counts in these columns are used to create an off-target rank-ordering (with column precedence in the order displayed in the file).

Off-Target Cutting Frequency Determination (CFD) Score Calcluation

The Cutting Frequency Determination (CFD) score is calculated by using the percent activity values provided in a matrix of penalties based on mismatches of each possible type at each position within the guide RNA sequence. This matrix will become available pending publication with a full description of it.

For example, if the interaction between the sgRNA and DNA has a single rG:dA ("rna G aligning with dna A") mismatch in position 6, then that interaction receives a score of 0.67. If there are two or more mismatches, then individual mismatch values are multiplied together. For example, an rG:dA mismatch at position 7 coupled with an rC:dT mismatch at position 10 receives a CFD score of 0.57 x 0.87 = 0.50.

DHS scoring (CRISPRa/i only)

For CRISPRa/i annotation, we also take into acount whether the target sequence is within a known ENCODE-annotated DNase I Hypersensitive Site. This is represented as a score ranging from 0 to 1 (1 is highest). At the moment this is a binary score as there are no values between 0 and 1, though this may change in the future. This score is not actually used in ranking the sgRNA candidate sequences; rather it is used as a filter during the cyclic picking algorithm described below.

sgRNA Picking

Once all candidate sgRNA sequences are fully annotated and ranked, the sgRNA Designer cycles through the list of candidates, attempting to pick sequences in order to achieve the desired quota. To pick sgRNAs for each transcript, we first choose the best-ranked sgRNA that satisfies the basic constraints that it targets within the 5 – 65% of the protein-coding region of the target gene and has an on-target score ≥ 0.2. We then select additional sgRNAs per transcript (also satisfying the above constraints), also requiring that each picked sgRNA targets a site at least 5% away (from a protein-coding standpoint) from previously-picked sgRNAs. This ensures diversity in target space, especially useful due to the potential for exons that are present in the reference transcript not to be included in any particular cellular model to which the library is applied. In order to meet the requested quota for some target genes, we may need to perform multiple rounds of picking, with each round relaxing some constraint, such as the 5 – 65% protein-coding region, the minimum Rule Set 2 score, or the 5% spacing criteria.

GPP Web Portal Terms of Service

Effective Date: December 8, 2025
By using this site, you agree to our terms and conditions below.

Overview of Terms

The data made available on this website were generated for research purposes and are not intended for clinical or commercial uses. Commercial use (or other use for profit-making purposes) of the GPP Web Portal and its tools, is not permitted under these terms and may require a separate license agreement from Broad or its contributors. For more information, please contact partnering@broadinstitute.org.

The original data may be subject to rights claimed by third parties, including but not limited to, patent, copyright, other intellectual property rights, biodiversity-related access and benefit-sharing rights. It is the responsibility of users of Broad Institute services to ensure that their use of the data does not infringe any of the rights of such third parties.

Any questions or comments concerning these Terms of Use can be addressed to: legal@broadinstitute.org.

By accessing and viewing this GPP Web Portal, you agree to the following terms and conditions:

Attribution

You agree to acknowledge the Broad Institute (e.g., in publications, services or products) for any of your use of its online services, databases or software in accordance with good scientific practice. You agree to use the acknowledgment wording provided for the relevant tools as indicated on the FAQ for each tool.

Updating the Terms of Use

We reserve the right to update these Terms of Use at any time. When alterations are inevitable, we will attempt to give reasonable notice of any changes by placing a notice on our website, but you may wish to check each time you use the website. The date of the most recent revision will appear on this, the "GPP Web Portal Terms of Use" page. If you do not agree to these changes, please do not continue to use our online services. We will also make available an archived copy of the previous Terms of Use for comparison.

Indemnification and Disclaimer of Warranties

You are using this GPP Web Portal at your own risk, and you hereby agree to hold Broad and its contributors and their trustees, directors, officers, employees, and affiliated investigators harmless for any third party claims which may arise from your use of the GPP Web Portal, the tools available therein, or any portion thereof. Further, you agree to indemnify Broad, its contributors, and its and their trustees, directors, officers, employees, affiliated investigators, students, and affiliates for any loss, costs, claims, damages, or other liabilities arising from any unpermitted commercial or profit-making use you make of the GPP Web Portal. The GPP Web Portal is a research tool and is provided "as is". Broad does not represent that the GPP Web Portal is free of errors or bugs or suitable for any particular tasks.

ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NONINFRINGEMENT, OR THE ABSENCE OF LATENT OR OTHER DEFECTS ARE DISCLAIMED. IN NO EVENT SHALL BROAD OR ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THE GPP WEB PORTAL, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Governing Law

The terms and conditions herein shall be construed, governed, interpreted, and applied in accordance with the internal laws of the Commonwealth of Massachusetts, U.S.A. Furthermore, by accessing, downloading, or using the Database, You consent to the personal jurisdiction of, and venue in, the state and federal courts within Massachusetts with respect to Your download or use of the Database.