| ccems-package {ccems} | R Documentation |
This package performs model selections of equilibriums in general and quasi-equilibriums of enzyme complexes in particular. Estimates of dissociation constants K that best describe a dataset are found by systematically scanning though all possibilities of K being infinity and/or plausibly equal to other K. The automatically generated space of models is then fitted to data. Automation enables searches of spaces too large to be specified by hand, e.g. spaces generated by combinatorially complex equilibriums.
| Package: | ccems |
| Type: | Package |
| Version: | 1.0 |
| Date: | 2009-1-2 |
| Depends: | odesolve,snow |
| Suggests: | nws |
| License: | GPL-2 |
| LazyLoad: | yes |
| LazyData: | yes |
| URL: | http://epbi-radivot.cwru.edu/ccems |
| Built: | R 2.8.1; ; 2009-01-13 17:24:33; windows |
Index:
RNR Ribonucleotide Reductase Data TK1 Thymidine Kinase 1 Data ems Equilibrium Model Selection fitModel Fit Model mkGrids Make Grid Model Space mkKd2Kj Make Kd2Kj Mappings mkModel Make Specific Model mkSpurs Make Spur Model Space mkg Make Generic Model simulateData Simulate Data
This package automatically generates and fits biochemical equilibrium models using as outputs either average protein mass data or enzyme reaction rate data. It is currently limited to systems where one central hub protein mediates all of the interactions and total concentrations of the reactants are approximately known exactly, e.g. as in systems that were reconstituted from purified reactants. It is limited further in that multiple sites for the same ligand must be filled in a predetermined sequence.
Equilibriums can be specified by any acyclic spanning subgraph of its nodes, where edges are dissociation constants. Here, hub protein oligomerization is viewed as a curtain rod from which threads of ligand bound states/complexes hang: each notch down a thread corresponds to one additional ligand bound to the hub j-mer where j increases as one moves to the right on the curtain rod. At the top of each thread is a head-node that sits on the rod. The head nodes must be specified, as some j values may be absent and some ligand sites (other than the thread defining site) may be assumed to be saturated in some j-mers. The last node in each thread will be referred to as a tail node. If a ligand has more than one binding site, the tail of the thread of one site (other than the last one filled) is the head of the thread of the site filled next. Thus, head nodes must be stated only for the first site filled.
The example given below, where t is dTTP and R is the large subunit of ribonucleotide reductase, is
not combinatorially complex, as there is only one ligand binding
site (the s-site) and the hub protein forms at most a dimer.
Thus, the thread topology of the acyclic graph used (to explore K equality
hypotheses) has only two head nodes and two threads.
The head node of the monomer thread is the free hub protein R1t0 and
the head node of the dimer thread is the ligand free dimer R2t0.
As there is only one site, the s-site, there are only two threads, one for the monomer
and one for the dimer. Threads contain the names of
only their non-head nodes since their heads have already been specified.
This structure is assigned to topology which is then passed to the function mkg
to produce a generic model object g. Together with the data, this
generic model object is then passed to the function ems (equilibrium model selection) which generates the
model space, fits it to the data, and returns the topN (typically 10 or 20) best (lowest AIC) models.
The user must have working directory write privileges so that the subdirectories
models and results can be created to hold model C code (generated
by mkg) and html output (generated by ems), respectively.
The intended use of this package is on a linux cluster. Its development and use to date has been on a ROCKS cluster.
This work was supported by the National Cancer Institute (K25CA104791).
Tom Radivoyevitch (txr24@case.edu)
Radivoyevitch, T. (2008) Equilibrium model selection: dTTP induced R1 dimerization. BMC Systems Biology 2, 15.
Radivoyevitch, T. Automated model generation and selection methods for combinatorially complex biochemical equilibriums. (to be submitted to Biology Direct).
library(ccems)
## this example corresponds to the reference above: dTTP induced R1 dimerization
topology <- list(
heads=c("R1t0","R2t0"),
sites=list(
s=list( # s-site thread #
m=c("R1t1"), # monomer 1
d=c("R2t1","R2t2") # dimer 2
)
)
)
g <- mkg(topology,TCC=TRUE)
data(RNR)
d1 <- subset(RNR,(year==2001)&(fg==1)&(G==0)&(t>0),select=c(R,t,m,year))
d2 <- subset(RNR,year==2006,select=c(R,t,m,year))
dd <- rbind(d1,d2)
names(dd)[1:2] <- paste(strsplit(g$id,split="")[[1]],"T",sep="") # e.g. to form "RT"
rownames(dd) <- 1:dim(dd)[1] # lose big number row names of parent dataframe
## Note: This block is for a ROCKS cluster
cpusPerHost=c("localhost" = 4,"compute-0-0"=4,"compute-0-1"=4,"compute-0-2"=4)
chnkPs <- list(size=100,n=1,maxnPs=2,extend2maxP=TRUE)
## Not run:
top10=ems(dd,g,cpusPerHost=cpusPerHost,chunkParams=chnkPs, ptype="SOCK")
## End(Not run)
# The next example gives the cluster a really big (~12 hour) job
library(ccems)
topology <- list(
heads=c("R1X0","R2X2","R4X4","R6X6"), # s-sites are already filled only in (j>1)-mer head nodes
sites=list(
a=list( # a-site thread #
m=c("R1X1"), # monomer 1
d=c("R2X3","R2X4"), # dimer 2
t=c("R4X5","R4X6","R4X7","R4X8"), # tetramer 3
h=c("R6X7","R6X8","R6X9","R6X10", "R6X11", "R6X12") # hexamer 4
),
h=list( ## tails of a-site threads are heads of h-site threads # h-site
m=c("R1X2"), # monomer 5
d=c("R2X5", "R2X6"), # dimer 6
t=c("R4X9", "R4X10","R4X11", "R4X12"), # tetramer 7
h=c("R6X13", "R6X14", "R6X15","R6X16", "R6X17", "R6X18") # hexamer 8
)
)
)
g=mkg(topology,TCC=TRUE)
dd=subset(RNR,(year==2002)&(fg==1)&(X>0),select=c(R,X,m,year))
names(dd)[1:2]=paste(strsplit(g$id,split="")[[1]],"T",sep="") # e.g. c("RT","XT")
cpusPerHost=c("localhost" = 4,"compute-0-0"=4,"compute-0-1"=4,"compute-0-2"=4)
chnkPs <- list(size=1000,n=1,maxnPs=3,extend2maxP=TRUE) # 29 choose 3(2) is 3654(406), so 3654 + 406 + 29 + 1 = 4090 spurs
## Not run:
top10=ems(dd,g,cpusPerHost=cpusPerHost,chunkParams=chnkPs, ptype="SOCK")
# The following are the last few lines of the output. The first line shows that the two parameter models are the best
# (shown are best AICs with increasing numbers of parameters). The next shows that it took 820 minutes on 16 cpus.
# And the block that follows shows that the top 10 modes are all two parameter spur graph models. The html file
# RXglobSOCK.htm in the results directory contains this information and more (e.g. parameter estimates and CI).
# Of the total number of models fitted reported in the html file, 4133, the difference 4133 - 4090 = 43 is the number of grid
# models fitted. Grid models are always fitted as one batch before spur model fitting begins.
[1] 41.14828 23.95284 -31.11051 -27.29232
Time difference of 819.9431 mins
... making HTML file ...
1 Model 252; nbp= 2; id=IIIIIIIJIIIJIIIIIIIIIIIIIIIII; AIC=-31.1105
2 Model 187; nbp= 2; id=IIIIJIIIIIIIJIIIIIIIIIIIIIIII; AIC=-30.9837
3 Model 186; nbp= 2; id=IIIIJIIIIIIJIIIIIIIIIIIIIIIII; AIC=-30.7098
4 Model 163; nbp= 2; id=IIIJIIIIIIIIJIIIIIIIIIIIIIIII; AIC=-30.4086
5 Model 232; nbp= 2; id=IIIIIIJIIIIIJIIIIIIIIIIIIIIII; AIC=-30.1868
> ## End(Not run)
## End(Not run)