WORKLIST ENTRIES (1):

2CENDOPTASE View alignment      2C endopeptidase (C24) cysteine protease family signature
 Type of fingerprint: COMPOUND with 4  elements
Links:
   PRINTS; PR00705 PAPAIN; PR00704 CALPAIN; PR00966 NIAPOTYPTASE
   PRINTS; PR00703 ADVENDOPTASE; PR00797 STREPTOPAIN; PR00707 UBCTHYDRLASE
   PRINTS; PR00776 HEMOGLOBNASE; PR00706 PYROGLUPTASE; PR00864 PREPILPTASE
   PRINTS; PR00917 SRSVCYSPTASE
   INTERPRO; IPR000317

 Creation date 29-JUN-1998; UPDATE 06-JUN-1999

   1. RAWLINGS, N.D. AND BARRETT, A.J.
   Families of cysteine peptidases.
   METHODS ENZYMOL. 244 461-486 (1994).

   2. BARRETT, A.J. AND RAWLINGS, N.D.
   Families and clans of cysteine peptidases
   PERSPECTIVES DRUG DISCOVERY DESIGN 6 1-11 (1996).

   3. RAWLINGS, N.D. AND BARRETT, A.J.
   Family C24 - Clan PA - 3C endopeptidase 
   http://www.bi.bbsrc.ac.uk/merops/famcards/c24.htm

   4. FEDERHEN, S., HOTTON, C., LEIPE, D. AND SOUSSOV, V.
   Calicivirus - NCBI Taxonomy Browser
   http://www3.ncbi.nlm.nih.gov/htbin-post/Taxonomy/wgetorg?id=11975&lvl=3

   5. WIRBLICH, C., THIEL,H. AND MEYERS, G.
   Genetic map of the calicivirus rabbit hemorrhagic diesease virus as detected
   from in vitro translation studies.
   J.VIROL. 70(11) 7974-7983 (1996).

   Cysteine protease activity is dependent on an active dyad of cysteine and
   histidine, the order and spacing of these residues varing in the known 
   families. Nearly half of all cysteine proteases are found exclusively
   in viruses [1]. Cysteine protease families have been grouped into five 
   clans (designated CA, CB, CC, CD and CE) on the basis of structural and
   functional similarity. Families C1, C2 and C10, which belong to the CA clan,
   have a Cys/His catalytic diad, and are loosely termed papain-like. Families
   in the CB clan have a His/Cys diad, and contain enzymes from RNA viruses
   distantly related to chymotrypsin. Enzymes in clan CC are also from RNA
   viruses, but have a papain-like Cys/His active site. The remaining two
   clans, CD and CE, contain only one family each [2]. Some families have not
   yet been asigned to a clan. 
  
   Two additional clans (PA and PB) have been identified, these containing a
   mixture of serine, cysteine and threonine proteases. Clan PA contains a
   catalytically-active serine or cysteine nucleophilic residue as part of the
   ordered triad His, Asp, Ser (or Cys). Clan PB contains a serine, cysteine or
   threonine active residue at the N-terminus of the mature protease [3]. 
  
   Caliciviruses are positive-stranded ssRNA viruses that cause gastroenteritis
   [4]. The calicivirus genome contains two open reading frames, ORF1 and ORF2.
   ORF1 encodes a non-structural polypeptide, which has RNA helicase, cysteine
   protease and RNA polymerase activity. The regions of the polyprotein in 
   which these activities lie are similar to proteins produced by the picorna-
   viruses. ORF2 encodes a structural protein [5]. Two different families of
   caliciviruses can be distinguished on the basis of sequence similarity, 
   namely those classified as small round structured viruses (SRSVs) and those
   classed as non-SRSVs. 
  
   Calicivirus proteases from the non-SRSV group, which are members of the PA
   protease clan, constitute family C24 of the cysteine proteases (proteases 
   from SRSVs belong to the C37 family). As mentioned above, the protease 
   activity resides within a polyprotein. The enzyme cleaves the polyprotein
   at sites N-terminal to itself, liberating the polyprotein helicase.
  
   2CENDOPTASE is a 4-element fingerprint that provides a signature for the 
   cysteine protease (C24) of non-SRSV caliciviruses. The fingerprint was 
   derived from an initial alignment of 4 sequences: the motifs were drawn 
   from conserved regions spanning the full length of the polyprotein protease,
   focusing on those regions that characterise members of the C24 family but
   distinguish them from the C37 proteases - motif 1 includes the active site
   histidine residue; and motif 3 contains the catalytic cysteine. Two 
   iterations on OWL30.2 were required to reach convergence, at which point
   a true set comprising 14 sequences was identified. 
  
   An update on SPTR37_9f identified a true set of 12 sequences.

  SUMMARY INFORMATION
     12 codes involving  4 elements
      0 codes involving  3 elements
      0 codes involving  2 elements

   COMPOSITE FINGERPRINT INDEX
  
    4|  12   12   12   12  
    3|   0    0    0    0  
    2|   0    0    0    0  
   --+---------------------
     |   1    2    3    4  

True positives..
 Q89273         Q86119         Q86117         POLN_RHDV      
 Q86114         POLN_FCVF9     Q96725         Q66913         
 POLN_FCVC6     Q66914         O92368         POLN_MANCV     


  PROTEIN TITLES
   Q89273           POLYPROTEIN - RABBIT HEMORRHAGIC DISEASE VIRUS (RHDV).
   Q86119           POLYPROTEIN - RABBIT HEMORRHAGIC DISEASE VIRUS (RHDV).
   Q86117           (SD) - RABBIT HEMORRHAGIC DISEASE VIRUS (RHDV).
   POLN_RHDV        NON-STRUCTURAL POLYPROTEIN [CONTAINS: RNA-DIRECTED RNA POLYM
   Q86114           POLYPROTEIN - RABBIT HEMORRHAGIC DISEASE VIRUS (RHDV).
   POLN_FCVF9       NON-STRUCTURAL POLYPROTEIN [CONTAINS: RNA-DIRECTED RNA POLYM
   Q96725           RNA - EUROPEAN BROWN HARE SYNDROME VIRUS.
   Q66913           NON-STRUCTURAL PROTEINS - FELINE CALICIVIRUS.
   POLN_FCVC6       NON-STRUCTURAL POLYPROTEIN [CONTAINS: RNA-DIRECTED RNA POLYM
   Q66914           POLYPROTEIN - FELINE CALICIVIRUS.
   O92368           NON-STRUCTURAL POLYPROTEIN - VESV-LIKE CALICIVIRUS.
   POLN_MANCV       GENOME POLYPROTEIN [CONTAINS: RNA-DIRECTED RNA POLYMERASE (E

SCAN HISTORY OWL30_2 2 50 NSINGLE SPTR37_9f 2 13 NSINGLE INITIAL MOTIF SETS 2CENDOPTASE1 Length of motif = 18 Motif number = 1 2C endopeptidase calicivirus protease motif I - 1 PCODE ST INT GYCIHMGHGVYASVAHVV POLN_FCVF9 1095 1095 GYCVHMGHGVYASVAHVV POLN_FCVC6 1097 1097 GWMIHIGNGLYISNTHTA POLN_RHDV 1120 1120 GYGVHIGNGNVITVTHVA POLN_MANCV 997 997 2CENDOPTASE2 Length of motif = 17 Motif number = 2 2C endopeptidase calicivirus protease motif II - 1 PCODE ST INT APFFSGKPTRDPWGSPV POLN_FCVF9 1145 32 APFFSGRPTRDPWGSPV POLN_FCVC6 1147 32 AQIAEGTPVCDWKKSPI POLN_RHDV 1165 27 GPFSQLPHMQIGSGSPV POLN_MANCV 1039 24 2CENDOPTASE3 Length of motif = 12 Motif number = 3 2C endopeptidase calicivirus protease motif III - 1 PCODE ST INT THPGDCGLPYID POLN_FCVF9 1188 26 THPGDCGLPYID POLN_FCVC6 1190 26 TTHGDCGLPLYD POLN_RHDV 1207 25 TKKGDCGLPYFN POLN_MANCV 1092 36 2CENDOPTASE4 Length of motif = 11 Motif number = 4 2C endopeptidase calicivirus protease motif IV - 1 PCODE ST INT DNGRVTGLHTG POLN_FCVF9 1200 0 DNGRVTGLHTG POLN_FCVC6 1202 0 SSGKIVAIHTG POLN_RHDV 1219 0 SNRQLVALHAG POLN_MANCV 1104 0 FINAL MOTIF SETS 2CENDOPTASE1 Length of motif = 18 Motif number = 1 2C endopeptidase calicivirus protease motif I - 2 PCODE ST INT GWMIHIGNGLYISNTHTA POLN_RHDV 1120 1120 GWMIHIGNGLYISNTHTA Q86117 1120 1120 GWMIHIGNGLYISNTHTA Q86119 1120 1120 GWMIHIGNGLYISNTHTA Q89273 1120 1120 GRMIHIGNGLYISNTHTA Q86114 1120 1120 GYCIHMGHGVYASVAHVV POLN_FCVF9 1095 1095 GWMIHIGNGMYLSNTHTA Q96725 1113 1113 GYCVHMGHGVYASVAHVV Q66913 1095 1095 GYCVHMGHGVYASVAHVV POLN_FCVC6 1097 1097 GYCVHMGHGVYATVAHVA Q66914 1095 1095 GYAIHIGHGVYISLKHVV O92368 1208 1208 GYGVHIGNGNVITVTHVA POLN_MANCV 997 997 2CENDOPTASE2 Length of motif = 17 Motif number = 2 2C endopeptidase calicivirus protease motif II - 2 PCODE ST INT AQIAEGTPVCDWKKSPI POLN_RHDV 1165 27 AQIAEGTPVCDWKKSPI Q86117 1165 27 AQIAEGTPVCDWKKSPI Q86119 1165 27 AQIAEGTPVCDWKKSPI Q89273 1165 27 AQIAEGTPVCDWKKSPI Q86114 1165 27 APFFSGKPTRDPWGSPV POLN_FCVF9 1145 32 AQIAEGTPVRDWKRASI Q96725 1158 27 APFFSGKPTRDPWGSPV Q66913 1145 32 APFFSGRPTRDPWGSPV POLN_FCVC6 1147 32 APFFPGKPTRDPWGSPV Q66914 1145 32 VPVGTSKPIKDPWGNPV O92368 1258 32 GPFSQLPHMQIGSGSPV POLN_MANCV 1039 24 2CENDOPTASE3 Length of motif = 12 Motif number = 3 2C endopeptidase calicivirus protease motif III - 2 PCODE ST INT TTHGDCGLPLYD POLN_RHDV 1207 25 TTHGDCGLPLYD Q86117 1207 25 TTHGDCGLPLYD Q86119 1207 25 TTHGDCGLPLYD Q89273 1207 25 TTHGDCGLPLYD Q86114 1207 25 THPGDCGLPYID POLN_FCVF9 1188 26 TTHGDCGLPLFD Q96725 1200 25 THPGDCGLPYID Q66913 1188 26 THPGDCGLPYID POLN_FCVC6 1190 26 THPGDCGLPYID Q66914 1188 26 TRQGDCGLPYVD O92368 1301 26 TKKGDCGLPYFN POLN_MANCV 1092 36 2CENDOPTASE4 Length of motif = 11 Motif number = 4 2C endopeptidase calicivirus protease motif IV - 2 PCODE ST INT SSGKIVAIHTG POLN_RHDV 1219 0 SSGKIVAIHTG Q86117 1219 0 SSGKIVAIHTG Q86119 1219 0 SSGKIVAIHTG Q89273 1219 0 SSGKIVAIHTG Q86114 1219 0 DNGRVTGLHTG POLN_FCVF9 1200 0 EAGKVVAIHTG Q96725 1212 0 DNGRVTGLHTG Q66913 1200 0 DNGRVTGLHTG POLN_FCVC6 1202 0 DNGRVTGLHTG Q66914 1200 0 DHGVVVGLHAG O92368 1313 0 SNRQLVALHAG POLN_MANCV 1104 0

User query: Display/Full Code "2CENDOPTASE"