SSMBS Computing Server
A tool to locate upto five user-defined motifs in a particular order in amino acids and nucleotide sequences.

Motifs pattern syntax

Guidelines to create regular expressions for motifs.

  1. The standard IUPAC one-letter codes for the amino acids are to be used.
  2. The standard A,G,C,T,U one letter codes for nucleotides are to be used.
  3. The symbol `x' is used for a position where any amino acid/nucleotide is accepted.
  4. Ambiguities are indicated by listing the acceptable amino acids/nucleotides for a given position, between square brackets `[ ]'. For example: [ALT] stands for Ala or Leu or Thr.
  5. Ambiguities are also indicated by prefixing a '^' to a list of amino acids/nucleotides. For example: [^AM] stands for any amino acid except Ala and Met.
  6. The elements in a pattern are supposed to follow one after the other without any 'identifier' to demarcate between them.
  7. Repetition of an element of the pattern can be indicated by following that element with a numerical value or a numerical range enclosed in '{}'.
  8. Repetition of an element of the pattern can be indicated by following that element with a numerical value or a numerical range enclosed in '{}'.
  9. Hydrophobic residues can be represented by the single letter 'O' in the motif.
  10. Polar residues can be represented by the single letter 'J' in the motif.

Examples :

x{3} corresponds to xxx

x{2,4} corresponds to xx or xxx or xxxx

A{3} corresponds to AAA while (AB){2,4} corresponds to ABAB or ABABAB or ABABABAB.

[AC]xVx{4}[^ED]
This pattern is translated as: [Ala or Cys]-any-Val-any-any-any-any-{any but Glu or Asp}

Ax[ST]{2}x{0,1}V
This pattern is translated as: Ala-any-[Ser or Thr]-[Ser or Thr]-(any or none)-Val