SPRITZ: help and references



The Server: description

Spritz is a web server for the prediction of intrinsically disordered regions in protein sequences. Our server predicts ordered/disordered residues using two specialised binary classifiers both implemented with probabilistic soft-margin support vector machines or C-SVM. The SVM-LD (LD: long disorder) classifier is trained on a subset of non redundant sequences known to contain only long disordered protein fragments (>=30 AA). The SVM-SD (SD: short disorder) classifier is trained instead on a subset of non redundant sequences with only short disordered fragments.
Spritz is an interface to send one query at a time. Spritz_multi is an interface to send multiple queries (up to 32Kbytes in total) in FASTA format.

Input formats

Email

Your email address, the place where the prediction will be delivered.
NOTE: Check that you typed your address correctly. A lot of the queries handled don't receive an answer because of incorrect typing.

Query name

An optional name for your query. We strongly suggest that you use one. The order in which you send your queries may not correspond to the order in which you receive the answers.
When using Spritz_multi no query name is requested.

Input sequence(s)

The sequence of amino acids:

  • You can submit bare sequences or sequences in FASTA format. A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line must begin with a greater-than (">") symbol in the first column. Spritz will ignore the description line, while Spritz_Multi will use it as query name.
  • Spritz will handle only single sequences: if you send multiple sequences through its interface, they will be concatenated and treated as a single one. If you want to submit multiple sequences, please use Spritz_multi.
  • Spaces, newlines and tabs will be ignored, so feel free to have them in your query.
  • Characters not corresponding to any aminoacid will be treated as X.
  • Only 1 letter amino acid code understood. Please do not send nucleotide sequences. If so, A will be treated as Alanine, C as Cysteine, etc...

Predictor type

You have two options:

  • Short disorder: spritz will predict disorder for each chain residue using the classifier trained exclusively on short disorder regions (SVM-SD)
  • Long disorder: spritz will otherwise use the classifier trained on long disorder regions (SVM-LD)

False positive rate threshold

You can set the false positive rate threshold (FPR) used by Spritz to map the probabilistic output into the final predicted residue class (ordered/disordered). We suggest you to choose the FPR with the help of Table I (SVM-SD), Table II (SVM-LD) and Fig. I. Basically, the FPR can be chosen according to the fraction of disordered amino acids that you expect to be recovered (sensitivity) or the fraction of predicted disordered residues estimated to be correct (specificity).

Figure I: ROC curve of the two independent classifiers (SVM-LD and SVM-SD) used by Spritz, as extracted from the DR category targets in CASP6. The independent k-fold cross validation shows similar trends.
FPR Specificity Sensitivity
1% 0.640.24
2% 0.550.32
3% 0.470.37
4% 0.420.41
5% 0.380.47
6% 0.330.49
7% 0.330.5
8% 0.330.5
9% 0.330.5
10% 0.330.5
Table I: Precision (specificity) vs. Recall (sensitivity) as a function of the user specified FPR (false positive threshold). Estimates extracted from k-fold cross validation of the SVM-SD classifier.
FPR Specificity Sensitivity
1% 0.390.09
2% 0.400.18
3% 0.430.34
4% 0.400.38
5% 0.380.44
6% 0.370.5
7% 0.350.53
8% 0.330.55
9% 0.310.57
10% 0.300.58
Table II: Precision (specificity) vs. Recall (sensitivity) as a function of the user specified FPR (false positive threshold). Estimates extracted from k-fold cross validation of the SVM-LD classifier.

Output format

Replies are sent by email and come as text, attached to the message. You might have to "view attachments inline" in your web browser to see these replies. If you submit multiple sequences through Spritz_Multi you will receive one separate email for each sequence. Here you have an example of prediction:

VELVEGDEGRMCVNTEWGAFGDSGELDEFLLEYDRLVDESSANPGQQLYEKLIGGKYMGE
CCEECCCCCCEEEECCCCCCCCCCCCCCCCCCHHHHHCCCCCCCCHHHHHHHHCCCCCCC
DDOOODDDOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOODDDDDOOOOOOOOOOOODDDD

The lines have the following meaning:

  • Line 1: The 1-letter code of your protein primary sequence.
  • Line 2: Secondary structure prediction by Porter:
    • H = helix : DSSP's H (alpha helix) + G (3-10 helix) + I (pi-helix) classes.
    • E = strand : DSSP's E (extended strand) + B (beta-bridge) classes.
    • C = the rest : DSSP's T (turn) + S (bend) + . (the rest).
    This line is always present.
  • Line 3: Disorder predictions by Spritz. Each letter in the sequence represents the class of disorder (defined as):
    • D=Disordered residue;
    • O=Ordered residue;


References

A.Vullo, O.Bortolami, G.Pollastri, S.Tosatto. "Spritz: a server for the prediction of intrinsically disordered regions in protein sequences using kernel machines". Nucleic Acids Research, 34:W164-W168, 2006.

G.Pollastri, A.McLysaght. "Porter: a new, accurate server for protein secondary structure prediction". Bioinformatics, 21(8),1719-20, 2005.
Toll-free link to the article



Back to Spritz


Alessandro Vullo, Gianluca Pollastri,
Gianluca Pollastri's group
School of Computer Science and Informatics
University College Dublin
Silvio Tosatto,
BioComputing GRUP,
University of Padua