The TRC shRNA Design Process

Overview

We design shRNA constructs ("clones") with an algorithm. Our algorithm uses several criteria to rank potential 21mer targets within each human and mouse Refseq transcript. The algorithm applies a set of rules, including those derived from the siRNA literature, analysis of TRC library performance datasets, constraints on the synthesis and cloning of the oligonucleotides and others. In applying the algorithm, our aim is to achieve a balance of two competing goals: make hairpins that effectively knock down the target transcript and, as best possible, design hairpins that knock down only one gene and do not directly alter other genes (so-called 'off-target' effects). Each goal presents distinct challenges. The criteria for predicting effective knockdown with either siRNA or shRNA are not well understood and are still being developed and refined. Specificity is constrained by genome evolution--since many genes are part of extensive gene families, targeting a specific family member can be difficult. Furthermore, functionally distinct genes share many motifs with underlying nucleic acid sequence similarity. Our knowledge of transcript structure and variants is still very incomplete as well. For all these reasons and more, we construct several shRNAs for each transcript with the expectation of getting a range of knockdown efficiencies across the set and at least a few which knockdown effectively.

Users of this database should be aware that in order to have consistent and reliable annotation, the RNAi Consortium decided early on to use NCBI's REFSEQ collection of transcripts as the definitive source of information for the primary target sequence for the design of shRNAs.

As a general rule in the construction of the library, we construct shRNAs targeting just one Refseq transcript for each NCBI gene. Because of the high sequence identity among different transcripts from the same gene, the majority of the shRNAs target all known transcript variants.

A brief narrative of the candidate selection process

Current Rule Set

Rule Set 9

Rule Description
1 aaStart9 Exclude any candidate beginning with AA (score = 0)
2 fourRow9 Exclude any candidate containing a run of four of the same base in a row (score = 0)
3 gcScore9 Exclude candidates with extreme GC percentage (GC <= 25% or > 60%); promote candidates with GC between 25-55% (score = 3); if GC > 55% and <= 60% then score = 1 (neutral)
4 nonGATC9 Exclude any candidate containing ambiguous bases (e.g. N) (score = 0)
5 restrictionSite9 Exclude any candidate containing certain restriction sites: ...GGTACC..., ...GAATTC..., ...CTCGAG..., ...CATATG..., ...ACTAGT..., ...GGTAC, ...GAATT, GTACC..., TACC..., CTAGT...
6 sevenGC9 Exclude any candidate with a run of 7 C/G bases (score = 0)
7 stemLoopStem Penalize candidates that can form an internal stem-loop (score = 0.1) (minimum stem length = 5, minimum loop size = 4)
8 threePrimeClamp6 Give precedence to candidates with weaker base-pairing at positions 15-20 (priority on pos. 17-19); score = 5 if all 6 positions are A or T, decreasing to 0.1 if all 6 are G/C. Score drops off steeply as the number of A/T bases decreases.

Previous Rule Sets

Rule Set 8

Rule Description
1 aaStart Penalize candidates beginning with AA (score = .000000000000001)
2 fourRow Penalize candidates containing four of the same base in a row gets (score = 0.01)
3 gcScore8 Penalize candidates with extreme GC percentage (GC <= 25% or > 60%; score = 0.01); promote candidates with GC between 25-55% (score = 3); if GC > 55% and <= 60% then score = 1 (neutral)
4 nonGATC Penalize candidates containing an ambiguous base (e.g. N) (score = 0.000000000000001)
5 restrictionSite8 Penalize any candidate containing certain restriction sites: ...GGTACC..., ...GAATTC..., ...CTCGAG..., ...CATATG..., ...ACTAGT..., ...GGTAC, ...GAATT (score = 0.0001)
6 sevenGC Penalize candidates containing a run of 7 C or G (score = 0.01)
7 stemLoopStem Penalize candidates that can form an internal stem-loop (score = 0.1) (minimum stem length = 5, minimum loop size = 4)
8 threePrimeClamp6 Give precedence to candidates with weaker base-pairing at positions 15-20 (priority on pos. 17-19); score = 5 if all 6 positions are A or T, decreasing to 0.1 if all 6 are G/C. Score drops off steeply as the number of A/T bases decreases.

Rule Set 7

Rule Description
1 aaStart Penalize candidates beginning with AA (score = .000000000000001)
2 fivePrimeClamp fivePrimeClamp:give precedence to a candidates with stronger base-pairing at the 5 prime end of the putative candidate, referred to as five_prime_clamp; penalty/reward .01 if first two positions are GG, .0001 if first two are TT; 2.5 if first four are (G|C){4}; 2.4 if first three positions are G|C{3}; 2.2 if begins (CC|CG|GC)(A|T)(G|C); 2 if begins (CC|CG|GC); 2 if begins (GC); 1.25 if begins (G|C); 1 if begins (A|T)(G|C); .5 if begins ((A|T){2}
3 fourRow Penalize candidates containing four of the same base in a row gets (score = 0.01)
4 gcScore gcContent: extremes of GC percentage are penalized; candidates with GC \< 30% are penalized .01; with > 70% the penalty is .01; with GC between 30-50% the candidate gets a reward of 3; with GC >60 and \<70% the reward/penalty is 1
5 internalAT internalAT; we want to reward moderately AT rich regions from 7 through 10; if all four are A|T, rewards is 2.2; if 3 of 4 are A|T, the reward is 2, if 2 of 4 is A|T, the reward is 1.5; if 1 or 4 is A|T, the penalty is .7; if none of the four are A|T, the penalty is 0.5
6 internalATFlanking internalATflank; we want to reward moderately AT-rich sequences at position 6 and 11; if both are AT, the reward is 1.2; if 1 is either A|T, the reward is 1 and if neither is A|T, the penalty is 0.85
7 internalLoop internalLoop: we penalize candidates that cand form a AAABBB loop with a 0.7 penalty
8 nonGATC Penalize candidates containing an ambiguous base (e.g. N) (score = 0.000000000000001)
9 restrictionSite GCCGGC, CCCGGG, CTCGAG, ...GCCGG
10 sevenGC Penalize candidates containing a run of 7 C or G (score = 0.01)
11 threePrimeClamp6 Give precedence to candidates with weaker base-pairing at positions 15-20 (priority on pos. 17-19); score = 5 if all 6 positions are A or T, decreasing to 0.1 if all 6 are G/C. Score drops off steeply as the number of A/T bases decreases.

Rule Set 4

Rule Description
1 aaStart Penalize candidates beginning with AA (score = .000000000000001)
2 fivePrimeClamp fivePrimeClamp:give precedence to a candidates with stronger base-pairing at the 5 prime end of the putative candidate, referred to as five_prime_clamp; penalty/reward .01 if first two positions are GG, .0001 if first two are TT; 2.5 if first four are (G|C){4}; 2.4 if first three positions are G|C{3}; 2.2 if begins (CC|CG|GC)(A|T)(G|C); 2 if begins (CC|CG|GC); 2 if begins (GC); 1.25 if begins (G|C); 1 if begins (A|T)(G|C); .5 if begins ((A|T){2}
3 fourRow Penalize candidates containing four of the same base in a row gets (score = 0.01)
4 gcScore gcContent: extremes of GC percentage are penalized; candidates with GC \< 30% are penalized .01; with > 70% the penalty is .01; with GC between 30-50% the candidate gets a reward of 3; with GC >60 and \<70% the reward/penalty is 1
5 internalAT internalAT; we want to reward moderately AT rich regions from 7 through 10; if all four are A|T, rewards is 2.2; if 3 of 4 are A|T, the reward is 2, if 2 of 4 is A|T, the reward is 1.5; if 1 or 4 is A|T, the penalty is .7; if none of the four are A|T, the penalty is 0.5
6 internalATFlanking internalATflank; we want to reward moderately AT-rich sequences at position 6 and 11; if both are AT, the reward is 1.2; if 1 is either A|T, the reward is 1 and if neither is A|T, the penalty is 0.85
7 internalLoop internalLoop: we penalize candidates that cand form a AAABBB loop with a 0.7 penalty
8 nonGATC Penalize candidates containing an ambiguous base (e.g. N) (score = 0.000000000000001)
9 sevenGC Penalize candidates containing a run of 7 C or G (score = 0.01)
10 threePrimeClamp threePrimeClamp: give precedence to a candidates with weaker base-pairing at the 3 prime end of the putative candidate; penalty/reward 5 if last three positions are A or T, 4.5 if last two are A|T and third from is G|C and fourth is A|T; 4 if the last two are A|T; 2 if the last base is A|T; penalty is .2 if last two posisitions are G|C; .5 if the last base is G|C; 0.8 if the last base is G|C and previous two are A|T