ccems-package {ccems}R Documentation

Combinatorially Complex Equilibrium Model Selection

Description

This package performs model selections of equilibriums in general and quasi-equilibriums of enzyme complexes in particular. Estimates of dissociation constants K that best describe a dataset are found by systematically scanning though all possibilities of K being infinity and/or plausibly equal to other K. The automatically generated space of models is then fitted to data. Automation enables searches of spaces too large to be specified by hand, e.g. spaces generated by combinatorially complex equilibriums.

Details

Package: ccems
Type: Package
Version: 1.0
Date: 2009-1-2
Depends: odesolve,snow
Suggests: nws
License: GPL-2
LazyLoad: yes
LazyData: yes
URL: http://epbi-radivot.cwru.edu/ccems
Built: R 2.8.1; ; 2009-01-13 17:24:33; windows

Index:

RNR                     Ribonucleotide Reductase Data
TK1                     Thymidine Kinase 1 Data
ems                     Equilibrium Model Selection
fitModel                Fit Model
mkGrids                 Make Grid Model Space
mkKd2Kj                 Make Kd2Kj Mappings
mkModel                 Make Specific Model
mkSpurs                 Make Spur Model Space
mkg                     Make Generic Model
simulateData            Simulate Data

This package automatically generates and fits biochemical equilibrium models using as outputs either average protein mass data or enzyme reaction rate data. It is currently limited to systems where one central hub protein mediates all of the interactions and total concentrations of the reactants are approximately known exactly, e.g. as in systems that were reconstituted from purified reactants. It is limited further in that multiple sites for the same ligand must be filled in a predetermined sequence.

Equilibriums can be specified by any acyclic spanning subgraph of its nodes, where edges are dissociation constants. Here, hub protein oligomerization is viewed as a curtain rod from which threads of ligand bound states/complexes hang: each notch down a thread corresponds to one additional ligand bound to the hub j-mer where j increases as one moves to the right on the curtain rod. At the top of each thread is a head-node that sits on the rod. The head nodes must be specified, as some j values may be absent and some ligand sites (other than the thread defining site) may be assumed to be saturated in some j-mers. The last node in each thread will be referred to as a tail node. If a ligand has more than one binding site, the tail of the thread of one site (other than the last one filled) is the head of the thread of the site filled next. Thus, head nodes must be stated only for the first site filled.

The example given below, where t is dTTP and R is the large subunit of ribonucleotide reductase, is not combinatorially complex, as there is only one ligand binding site (the s-site) and the hub protein forms at most a dimer. Thus, the thread topology of the acyclic graph used (to explore K equality hypotheses) has only two head nodes and two threads. The head node of the monomer thread is the free hub protein R1t0 and the head node of the dimer thread is the ligand free dimer R2t0. As there is only one site, the s-site, there are only two threads, one for the monomer and one for the dimer. Threads contain the names of only their non-head nodes since their heads have already been specified. This structure is assigned to topology which is then passed to the function mkg to produce a generic model object g. Together with the data, this generic model object is then passed to the function ems (equilibrium model selection) which generates the model space, fits it to the data, and returns the topN (typically 10 or 20) best (lowest AIC) models.

The user must have working directory write privileges so that the subdirectories models and results can be created to hold model C code (generated by mkg) and html output (generated by ems), respectively.

The intended use of this package is on a linux cluster. Its development and use to date has been on a ROCKS cluster.

Note

This work was supported by the National Cancer Institute (K25CA104791).

Author(s)

Tom Radivoyevitch (txr24@case.edu)

References

Radivoyevitch, T. (2008) Equilibrium model selection: dTTP induced R1 dimerization. BMC Systems Biology 2, 15.

Radivoyevitch, T. Automated model generation and selection methods for combinatorially complex biochemical equilibriums. (to be submitted to Biology Direct).

See Also

ems, mkg

Examples

library(ccems)
## this example corresponds to the reference above: dTTP induced R1 dimerization
topology <- list(  
        heads=c("R1t0","R2t0"),  
        sites=list(       
                s=list(                     # s-site    thread #
                        m=c("R1t1"),        # monomer      1
                        d=c("R2t1","R2t2")  # dimer        2
                )
        )
) 
g <- mkg(topology,TCC=TRUE) 
data(RNR)
d1 <- subset(RNR,(year==2001)&(fg==1)&(G==0)&(t>0),select=c(R,t,m,year))
d2 <- subset(RNR,year==2006,select=c(R,t,m,year)) 
dd <- rbind(d1,d2)
names(dd)[1:2] <- paste(strsplit(g$id,split="")[[1]],"T",sep="") # e.g. to form "RT"
rownames(dd) <- 1:dim(dd)[1] # lose big number row names of parent dataframe

## Note: This block is for a ROCKS cluster
cpusPerHost=c("localhost" = 4,"compute-0-0"=4,"compute-0-1"=4,"compute-0-2"=4)
chnkPs <- list(size=100,n=1,maxnPs=2,extend2maxP=TRUE)
## Not run: 
top10=ems(dd,g,cpusPerHost=cpusPerHost,chunkParams=chnkPs, ptype="SOCK") 
## End(Not run)
# The next example gives the cluster a really big (~12 hour) job
library(ccems)
topology <- list(
        heads=c("R1X0","R2X2","R4X4","R6X6"), # s-sites are already filled only in (j>1)-mer head nodes 
        sites=list(                    
                a=list(                                                              # a-site       thread #
                        m=c("R1X1"),                                                 # monomer          1
                        d=c("R2X3","R2X4"),                                          # dimer            2
                        t=c("R4X5","R4X6","R4X7","R4X8"),                            # tetramer         3
                        h=c("R6X7","R6X8","R6X9","R6X10", "R6X11", "R6X12")          # hexamer          4
                ),
                h=list( ## tails of a-site threads are heads of h-site threads       # h-site
                        m=c("R1X2"),                                                 # monomer          5
                        d=c("R2X5", "R2X6"),                                         # dimer            6
                        t=c("R4X9", "R4X10","R4X11", "R4X12"),                       # tetramer         7
                        h=c("R6X13", "R6X14", "R6X15","R6X16", "R6X17", "R6X18")     # hexamer          8
                )
        )
)
g=mkg(topology,TCC=TRUE) 
dd=subset(RNR,(year==2002)&(fg==1)&(X>0),select=c(R,X,m,year))
names(dd)[1:2]=paste(strsplit(g$id,split="")[[1]],"T",sep="") # e.g. c("RT","XT")

cpusPerHost=c("localhost" = 4,"compute-0-0"=4,"compute-0-1"=4,"compute-0-2"=4)
chnkPs <- list(size=1000,n=1,maxnPs=3,extend2maxP=TRUE) # 29 choose 3(2) is 3654(406), so 3654 + 406 + 29 + 1 = 4090 spurs 
## Not run: 

top10=ems(dd,g,cpusPerHost=cpusPerHost,chunkParams=chnkPs, ptype="SOCK") 

# The following are the last few lines of the output. The first line shows that the two parameter models are the best
# (shown are best AICs with increasing numbers of parameters). The next shows that it took 820 minutes on 16 cpus. 
# And the block that follows shows that the top 10 modes are all two parameter spur graph models. The html file 
# RXglobSOCK.htm in the results directory contains this information and more (e.g. parameter estimates and CI).
# Of the total number of models fitted reported in the html file, 4133, the difference 4133 - 4090 = 43 is the number of grid 
# models fitted. Grid models are always fitted as one batch before spur model fitting begins. 

[1]  41.14828  23.95284 -31.11051 -27.29232
Time difference of 819.9431 mins

 ... making HTML file ... 
  1 Model 252; nbp= 2; id=IIIIIIIJIIIJIIIIIIIIIIIIIIIII; AIC=-31.1105
  2 Model 187; nbp= 2; id=IIIIJIIIIIIIJIIIIIIIIIIIIIIII; AIC=-30.9837
  3 Model 186; nbp= 2; id=IIIIJIIIIIIJIIIIIIIIIIIIIIIII; AIC=-30.7098
  4 Model 163; nbp= 2; id=IIIJIIIIIIIIJIIIIIIIIIIIIIIII; AIC=-30.4086
  5 Model 232; nbp= 2; id=IIIIIIJIIIIIJIIIIIIIIIIIIIIII; AIC=-30.1868

> ## End(Not run)
## End(Not run)

[Package ccems version 1.0 Index]