Protein Characterization Package for Machine Learning

Machine Learning (ML) techniques have demonstrated themselves useful for a variety of protein structure prediction tasks. The PCP-ML contains a number of functions that are commonly used when performing ML tasks with proteins.

Have a question? Maybe it's answered on our FAQs page.


The tarball below contains the source code, documentation and examples for PCP-ML.


PCP-ML has three principle components: Parsers, Characterizers and Encodes and Writers. The parsers extract commonly used data from the output of programs such as PSIPred and DSSP. Characterizers and Encoders convert this data into forms which are meaningful in ML methods. There are also a number of characterizers provide numerical representations of hydrophobicity, contact potentials, etc. The writers format and output the generated features so as to be compatible with ML programs (e.g., SVMlight).

Parsers Characterizes and Encoders Feature Writers/Generators
and Utilities


Here we provide some examples on how you could use PCP-ML to quickly generate some training files for a simple secondary structure predictor. The feature generator is implemented in both C++ and Python.

Alternatively, you could create a stand-alone classifier which instead of writing out the features to a text file, the features could be combined with existing ML code bases.


Code level documentation can generated using Doxygen. If you have Doxygen installed on your system, run "make documentation" to generate it from your source tree.

It is available here as well.

MLiD Lab

Copyright 2014