CustomizingCHASM

From Chasm Software Wiki

Revision as of 18:37, 3 March 2011 by WikiSysop (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Customizing CHASM

You may be able to improve CHASM's performance by applying the feature selection protocol used in Original CHASM.

Step 1. Assemble your feature selection set.

BuildClassifier -m MutationTable -o ClassifierName -p

see CHASM_Tutorial for details.

This will generate a new directory

$CHASMDIR/BuiltClassifiers/ClassifierName

which will now contain three files

drivers.tmps
passengers.tmps
AllFeatures.list
  • We recommend that you split drivers.tmps into two randomly partitioned files of driver mutations. Name one of these files drivers_fs.tmps and name the other drivers.tmps (which was the name of the original file before you split it). You will use the drivers_fs.tmps file for your feature selection. This will avoid classifier overfit. The new drivers.tmps file will be used to train your classifier at a later step.
  • You will rename (DO NOT COPY) passengers.tmps to passengers_fs.tmps but you will not split it into two parts. A new passengers.tmps will be generated when you train your classifier.
  • You now should have two additional files called passengers_fs.tmps and drivers_fs.tmps

Step 2. Compute all features available in SNVBox for the mutations in passengers_fs.tmps and drivers_fs.tmps as described in SNVBox_Tutorial#Preparing the requisite files (ignore Step 1) and SNVBox_Tutorial#Retrieving Features.

  • The output of this step is two files in ARFF format


Step 3. Use a package such as WEKA to select the most informative features. Original CHASM used feature selection based on mutual information with class labels and a feature threshold of 0.001 bits. WEKA and many other packages accept ARFF format, but you will have to concatenate the driver and passenger ARFF files together and add a class identification column.

Step 4. Train your classifier as described in CHASM_Tutorial#Training the Classifier using only features selected in Step 3. YOU MUST USE THE SAME MutationTable and ClassifierName AS IN STEP 1.


Custom Context Tables

For information on constructing custom context tables in CHASM read this.

Personal tools