The protein data bank has a large number of protein structures available. But most of them are redundant, or very closely related in terms of structure. For this reason, we seek our representative sets in order to produce methods of recognition. Proteins can have the same fold, even if they only share 15% on their amino acids. For this reasons, we need to find other indicators that will help us determine is two amino acid sequences, even if they are very dissimilar, have the same protein backbone structure. Using this protein features, other then primary sequence, to align two proteins, is usually called protein threading. I'm working on a protein threading program.
Right now, spending some time looking up different protein test sets. There are many different systems set up for testing the performance of a protein structure prediction system, including CASP, and Livebench. But this primarily test the performance of the system, with a new protein, assuming an exact match does not already exist in the database. This is a good way to evaluate the performance of a system in the real world, but while devoloping and training a new system, you need test sets that can give your program enough veratity to train it adequately, but still maintain a subset of nonredundent information that can be used to evaluate the end result.
There are two main types of test sets that come up when dealing with protein threading: The alignment test and the fold recognition test. When you align to proteins with a protein folding technique, you don't always know if the structure actual represents a relative of the target. You may be able to align the two sequences, but they may not actually have the same structure. If this is the case, you should be able to recognize it. So by doing alignments against a set of non-redudunant proteins, you should be able to pick out the protein that most closely matches your query sequence.
| compbio_dude ( |
Protein structure prediction
- Post a new comment
- 0 comments
- Post a new comment
- 0 comments