DeepSpCas9 Activity Prediction

Deep learning–based SpCas9 guide RNA editing efficiency prediction

30-nt Input Format
4 nt
Upstream
pos 1–4
20 nt
Protospacer
pos 5–24
3 nt
PAM (NGG)
pos 25–27
3 nt
Downstream
pos 28–30
ACCCCCTCCACCCCGCCTCCGGGACT
Total: exactly 30 nucleotides · A/T/G/C only
Sequence Input
0 sequence(s) detected
Example Sequences
ACCCCCTCCACCCCGCCTCCGGGACTGCGA
GTCGCCCTCGAACTTCACCTCGGCGCGGGG
ATAGAATACTCAAGCTATGCATCAAGCTTG
Activity Classification
High
≥ 60%
Medium
30–60%
Low
< 30%

Predicted indel frequency (%): expected fraction of alleles with a SpCas9-induced insertion or deletion at the target site.

About DeepSpCas9

DeepSpCas9 predicts SpCas9 editing efficiency using a deep CNN trained on ~12,000 target sequences measured in human cells.

Architecture
3-branch CNN (inception-style)
100 / 70 / 40 filters at 3 / 5 / 7 nt widths
Avg Pool → Flatten → Concat
All 3 branches merged into fully connected layers
FC(80) → FC(60) → Linear output
Dropout 0.3 at each layer, regression target = indel %
Training data
12,832 SpCas9 target sequences in human cells

Input Rules
  • Exactly 30 nucleotides per sequence
  • Only A / T / G / C — no ambiguous bases
  • PAM region (pos 25–27) should be NGG
  • One per line, or standard FASTA format
  • Up to 100 sequences per run