ccems-package             package:ccems             R Documentation

_C_o_m_b_i_n_a_t_o_r_i_a_l_l_y _C_o_m_p_l_e_x _E_q_u_i_l_i_b_r_i_u_m _M_o_d_e_l _S_e_l_e_c_t_i_o_n

_D_e_s_c_r_i_p_t_i_o_n:

     This package performs model selections of equilibriums in general
     and quasi-equilibriums of enzyme complexes in particular.
     Estimates of dissociation constants K that best describe a dataset
     are found by  systematically scanning though all possibilities of
     K being infinity and/or plausibly equal to other K.  The
     automatically generated space of models is then fitted to data. 
     Automation enables searches of spaces  too large to be specified
     by hand, e.g. spaces generated by combinatorially complex
     equilibriums.

_D_e_t_a_i_l_s:


       Package:   ccems
       Type:      Package
       Version:   1.0
       Date:      2009-1-2
       Depends:   odesolve,snow
       Suggests:  nws
       License:   GPL-2
       LazyLoad:  yes
       LazyData:  yes
       URL:       http://epbi-radivot.cwru.edu/ccems
       Built:     R 2.8.1; ; 2009-01-13 17:24:33; windows

     Index:


     RNR                     Ribonucleotide Reductase Data
     TK1                     Thymidine Kinase 1 Data
     ems                     Equilibrium Model Selection
     fitModel                Fit Model
     mkGrids                 Make Grid Model Space
     mkKd2Kj                 Make Kd2Kj Mappings
     mkModel                 Make Specific Model
     mkSpurs                 Make Spur Model Space
     mkg                     Make Generic Model
     simulateData            Simulate Data

     This package automatically generates and fits biochemical
     equilibrium models using as outputs either average protein mass 
     data or enzyme reaction rate data.  It is currently limited to
     systems where one central hub protein mediates all of the
     interactions and total  concentrations of the reactants are
     approximately known exactly, e.g. as in systems that were
     reconstituted  from purified reactants.  It is limited further in
     that multiple sites for the same ligand must be filled in a
     predetermined sequence. 

     Equilibriums can be specified by any acyclic spanning subgraph of
     its nodes, where edges are  dissociation constants. Here, hub
     protein oligomerization is viewed as a curtain rod from which
     threads  of ligand bound states/complexes hang: each notch down a
     thread  corresponds to one additional ligand bound to the hub
     j-mer where j increases as  one moves to the right on the curtain
     rod. At the top of each thread is a head-node that sits on the
     rod. The head nodes must be specified, as  some j values may be
     absent and some ligand sites (other than the thread  defining
     site) may be assumed to be saturated in some j-mers. The last node
     in  each thread will be referred to as a tail node. If a ligand
     has more than one binding site,  the tail of the thread of one
     site (other than the last one filled) is  the head of the thread
     of the site filled next.  Thus, head nodes must be stated only for
     the first site filled.  

     The example given below, where t is dTTP and R is the large
     subunit of ribonucleotide reductase, is  not combinatorially
     complex, as there is only one ligand binding  site (the s-site)
     and the hub protein forms at most a dimer.  Thus, the thread
     topology of the acyclic graph used (to explore K equality 
     hypotheses) has only two head nodes and two threads.   The head
     node of the monomer thread is the free hub protein R1t0 and  the
     head node of the dimer thread is the ligand free dimer R2t0.   As
     there is only one site, the s-site, there are only two threads,
     one for the monomer  and one for the dimer. Threads contain the
     names of only their non-head nodes since their heads have already
     been specified.  This structure is assigned to 'topology' which is
     then passed to the function 'mkg'  to produce a generic model
     object 'g'. Together with the data, this generic model object is
     then passed to the function 'ems' (equilibrium model selection)
     which generates the  model space, fits it to the data, and returns
     the 'topN' (typically 10 or 20) best (lowest AIC) models.  

     The user must have working directory write privileges so that the
     subdirectories 'models' and 'results' can be created to hold 
     model C code (generated  by 'mkg') and html output (generated by
     'ems'), respectively.

     The intended use of this package is on a linux cluster. Its
     development and use to date has been on a ROCKS cluster.

_N_o_t_e:

     This work was supported by the National Cancer Institute
     (K25CA104791).

_A_u_t_h_o_r(_s):

     Tom Radivoyevitch (txr24@case.edu)

_R_e_f_e_r_e_n_c_e_s:

     Radivoyevitch, T. (2008) Equilibrium model selection: dTTP induced
     R1 dimerization. _BMC Systems Biology_ *2*, 15. 

     Radivoyevitch, T.  Automated model generation and selection
     methods for combinatorially complex biochemical equilibriums. (to
     be submitted to _Biology Direct_).

_S_e_e _A_l_s_o:

     'ems',  'mkg'

_E_x_a_m_p_l_e_s:

     library(ccems)
     ## this example corresponds to the reference above: dTTP induced R1 dimerization
     topology <- list(  
             heads=c("R1t0","R2t0"),  
             sites=list(       
                     s=list(                     # s-site    thread #
                             m=c("R1t1"),        # monomer      1
                             d=c("R2t1","R2t2")  # dimer        2
                     )
             )
     ) 
     g <- mkg(topology,TCC=TRUE) 
     data(RNR)
     d1 <- subset(RNR,(year==2001)&(fg==1)&(G==0)&(t>0),select=c(R,t,m,year))
     d2 <- subset(RNR,year==2006,select=c(R,t,m,year)) 
     dd <- rbind(d1,d2)
     names(dd)[1:2] <- paste(strsplit(g$id,split="")[[1]],"T",sep="") # e.g. to form "RT"
     rownames(dd) <- 1:dim(dd)[1] # lose big number row names of parent dataframe

     ## Note: This block is for a ROCKS cluster
     cpusPerHost=c("localhost" = 4,"compute-0-0"=4,"compute-0-1"=4,"compute-0-2"=4)
     chnkPs <- list(size=100,n=1,maxnPs=2,extend2maxP=TRUE)
     ## Not run: 
     top10=ems(dd,g,cpusPerHost=cpusPerHost,chunkParams=chnkPs, ptype="SOCK") 
     ## End(Not run)
     # The next example gives the cluster a really big (~12 hour) job
     library(ccems)
     topology <- list(
             heads=c("R1X0","R2X2","R4X4","R6X6"), # s-sites are already filled only in (j>1)-mer head nodes 
             sites=list(                    
                     a=list(                                                              # a-site       thread #
                             m=c("R1X1"),                                                 # monomer          1
                             d=c("R2X3","R2X4"),                                          # dimer            2
                             t=c("R4X5","R4X6","R4X7","R4X8"),                            # tetramer         3
                             h=c("R6X7","R6X8","R6X9","R6X10", "R6X11", "R6X12")          # hexamer          4
                     ),
                     h=list( ## tails of a-site threads are heads of h-site threads       # h-site
                             m=c("R1X2"),                                                 # monomer          5
                             d=c("R2X5", "R2X6"),                                         # dimer            6
                             t=c("R4X9", "R4X10","R4X11", "R4X12"),                       # tetramer         7
                             h=c("R6X13", "R6X14", "R6X15","R6X16", "R6X17", "R6X18")     # hexamer          8
                     )
             )
     )
     g=mkg(topology,TCC=TRUE) 
     dd=subset(RNR,(year==2002)&(fg==1)&(X>0),select=c(R,X,m,year))
     names(dd)[1:2]=paste(strsplit(g$id,split="")[[1]],"T",sep="") # e.g. c("RT","XT")

     cpusPerHost=c("localhost" = 4,"compute-0-0"=4,"compute-0-1"=4,"compute-0-2"=4)
     chnkPs <- list(size=1000,n=1,maxnPs=3,extend2maxP=TRUE) # 29 choose 3(2) is 3654(406), so 3654 + 406 + 29 + 1 = 4090 spurs 
     ## Not run: 

     top10=ems(dd,g,cpusPerHost=cpusPerHost,chunkParams=chnkPs, ptype="SOCK") 

     # The following are the last few lines of the output. The first line shows that the two parameter models are the best
     # (shown are best AICs with increasing numbers of parameters). The next shows that it took 820 minutes on 16 cpus. 
     # And the block that follows shows that the top 10 modes are all two parameter spur graph models. The html file 
     # RXglobSOCK.htm in the results directory contains this information and more (e.g. parameter estimates and CI).
     # Of the total number of models fitted reported in the html file, 4133, the difference 4133 - 4090 = 43 is the number of grid 
     # models fitted. Grid models are always fitted as one batch before spur model fitting begins. 

     [1]  41.14828  23.95284 -31.11051 -27.29232
     Time difference of 819.9431 mins

      ... making HTML file ... 
       1 Model 252; nbp= 2; id=IIIIIIIJIIIJIIIIIIIIIIIIIIIII; AIC=-31.1105
       2 Model 187; nbp= 2; id=IIIIJIIIIIIIJIIIIIIIIIIIIIIII; AIC=-30.9837
       3 Model 186; nbp= 2; id=IIIIJIIIIIIJIIIIIIIIIIIIIIIII; AIC=-30.7098
       4 Model 163; nbp= 2; id=IIIJIIIIIIIIJIIIIIIIIIIIIIIII; AIC=-30.4086
       5 Model 232; nbp= 2; id=IIIIIIJIIIIIJIIIIIIIIIIIIIIII; AIC=-30.1868

     > ## End(Not run)
     ## End(Not run)

