PCP-ML

Protein Characterization Package for Machine Learning

Machine Learning (ML) techniques have demonstrated themselves useful for a variety of protein structure prediction tasks. The PCP-ML contains a number of functions that are commonly used when performing ML tasks with proteins.

Have a question? Maybe it's answered on our FAQs page.

Code

The tarball below contains the source code, documentation and examples for PCP-ML.

Functionality

PCP-ML has three principle components: Parsers, Characterizers and Encodes and Writers. The parsers extract commonly used data from the output of programs such as PSIPred and DSSP. Characterizers and Encoders convert this data into forms which are meaningful in ML methods. There are also a number of characterizers provide numerical representations of hydrophobicity, contact potentials, etc. The writers format and output the generated features so as to be compatible with ML programs (e.g., SVMlight).

Parsers Characterizes and Encoders Feature Writers/Generators
and Utilities
ParseFastaSequencesAtchelyFactorsPrintFeatures
ParseSSProInterfaceContactPotentialsWriteFeatures
ParsePSIPredBetaContactPotentialsReadFile
ParseAsciiPSSMSSComposition 
ParseAnchoredMSASAComposition 
ParseDSSPOutputAAComposition 
 Hydrophobicity 
 CalculateEntropy 
 CalculateR 
 CalculateCosine 
 ScaledOrderMean 
 HotEncodeAA 
 HotEncodeSS 
 HotEncodeSA 

Examples

Here we provide some examples on how you could use PCP-ML to quickly generate some training files for a simple secondary structure predictor. The feature generator is implemented in both C++ and Python.

Alternatively, you could create a stand-alone classifier which instead of writing out the features to a text file, the features could be combined with existing ML code bases.

Documentation

Code level documentation can generated using Doxygen. If you have Doxygen installed on your system, run "make documentation" to generate it from your source tree.

It is available here as well.

MLiD Lab

Copyright 2014