ccems-package             package:ccems             R Documentation

_C_o_m_b_i_n_a_t_o_r_i_a_l_l_y _C_o_m_p_l_e_x _E_q_u_i_l_i_b_r_i_u_m _M_o_d_e_l _S_e_l_e_c_t_i_o_n

_D_e_s_c_r_i_p_t_i_o_n:

     This package performs model selections of equilibriums in general
     and quasi-equilibriums of enzyme complexes in particular.
     Estimates of dissociation constants K that best describe a dataset
     are found by  systematically scanning though all possibilities of
     K being infinity and/or plausibly equal to other K.  The
     automatically generated space of models is then fitted to data. 
     Automation enables searches of spaces  too large to be specified
     by hand, e.g. spaces generated by combinatorially complex
     equilibriums.

_D_e_t_a_i_l_s:


       Package:   ccems
       Type:      Package
       Depends:   odesolve,snow
       Suggests:  nws
       License:   GPL-2
       LazyLoad:  yes
       LazyData:  yes
       URL:       http://epbi-radivot.cwru.edu/ccems

     Index:


     RNR                     Ribonucleotide Reductase Data
     TK1                     Thymidine Kinase 1 Data
     ems                     Equilibrium Model Selection
     fitModel                Fit Model
     mkGrids                 Make Grid Model Space
     mkKd2Kj                 Make Kd2Kj Mappings
     mkModel                 Make Specific Model
     mkSpurs                 Make Spur Model Space
     mkg                     Make Generic Model
     simulateData            Simulate Data

     This package automatically generates and fits biochemical
     equilibrium models using as outputs either average protein mass 
     data or enzyme reaction rate data.  It is currently limited to
     systems where one central hub protein mediates all of the
     interactions and total  concentrations of the reactants are
     approximately known exactly, e.g. as in systems that were
     reconstituted  from purified reactants.  It is limited further in
     that multiple sites for the same ligand must be filled in a
     predetermined sequence. 

     Equilibriums can be specified by any acyclic spanning subgraph of
     its nodes, where edges are  dissociation constants. Here, hub
     protein oligomerization is viewed as a curtain rod from which
     threads  of ligand bound states/complexes hang: each notch down a
     thread  corresponds to one additional ligand bound to the hub
     j-mer where j increases as  one moves to the right on the curtain
     rod. At the top of each thread is a head-node that sits on the
     rod. The head nodes must be specified, as  some j values may be
     absent and some ligand sites (other than the thread  defining
     site) may be assumed to be saturated in some j-mers. The last node
     in  each thread will be referred to as a tail node. If a ligand
     has more than one binding site,  the tail of the thread of one
     site (other than the last one filled) is  the head of the thread
     of the site filled next.  Thus, head nodes must be stated only for
     the first site filled.  

     In the examples below, E is the concentration of thymidine kinase
     1 (TK1) tetramers, S is thymidine,  t is dTTP, X is ATP and R is
     the large subunit of ribonucleotide reductase (RNR). The examples
     are ordered by cpu consumption: the first takes ~0.5 min on 1
     core, the second ~1.5 minutes on 2 cores, and  the third ~2 days
     on 16 cores. The first fits activity data to a single thread
     model. It   is the fastest example because it uses rational
     polynomials for  the system model because [E] is small enough that
     total [S] approximates free [S].  In the second example there is
     only one ligand binding  site (the s-site) and the hub protein
     forms at most a dimer.  Thus, the thread topology of the acyclic
     graph used (to explore K equality  hypotheses) has only two head
     nodes and two threads.   The head node of the monomer thread is
     the free hub protein R1t0 and  the head node of the dimer thread
     is the ligand free dimer R2t0.   As there is only one site, the
     s-site, there are only two threads, one for the monomer  and one
     for the dimer. Threads contain the names of only their non-head
     nodes since their heads have already been specified.  This
     structure is assigned to 'topology' which is then passed to the
     function 'mkg'  to produce a generic model object 'g'. Together
     with the data, this generic model object is then passed to the
     function 'ems' (equilibrium model selection) which generates the 
     model space, fits it to the data, and returns the 'topN'
     (typically 5, 10 or 20) best (lowest AIC) models.   The third
     example is more complicated than the second because ATP has
     multiple R1 binding sites and because R also tetramerizes and
     hexamerizes with increases in [ATP]. This problem motivated the
     development of this R package.  It is an example of a problem
     whose solution  is enabled by this software because its model
     space is too large to specify by hand. A linux cluster is needed
     to execute this example.

     The user must have working directory write privileges so that the
     subdirectories 'models' and 'results' can be created to hold 
     model C code (generated  by 'mkg') and html output (generated by
     'ems'), respectively.

_N_o_t_e:

     This work was supported by the National Cancer Institute
     (K25CA104791).

_A_u_t_h_o_r(_s):

     Tom Radivoyevitch (txr24@case.edu)

_R_e_f_e_r_e_n_c_e_s:

     Radivoyevitch, T. (2008) Equilibrium model selection: dTTP induced
     R1 dimerization. _BMC Systems Biology_ *2*, 15. 

     Radivoyevitch, T.  Automated model generation and analysis methods
     for combinatorially  complex biochemical equilibriums.
     (submitted).

_S_e_e _A_l_s_o:

     'ems',  'mkg'

_E_x_a_m_p_l_e_s:

     ## LAPTOP EXAMPLE: Top 3 three parameter models of 
     ##                 Berenstein et al. JBC 2000 TK1 data
     library(ccems)
     topology <- list(  
         heads=c("E1S0"), #one E is a tetramer
         sites=list(                    
             c=list(    # c-site = catyltic site  
                 t=c("E1S1","E1S2","E1S3","E1S4")   
             )
         )
     )
     g <- mkg(topology,hubChar="E",activity=TRUE,TCC=FALSE)
     dd=subset(TK1,(year==2000),select=c(E,S,v)) # Berenstein et al
     names(dd)[1:2]= c("ET","ST")
     tops=ems(dd,g,maxTotalPs=3,kIC=30000) 
     plot(dd$ST,dd$v,type="p",pch=1, xlab="[dT] (uM)", ylab="v",
               main="Top 3 TK1 Models with 3 parameters or less")
     lgx=log(dd$ST)
     upr=range(lgx)[2]
     lwr=range(lgx)[1]
     del=(upr-lwr)/50
     fineX=exp(seq(lwr,upr,by=del))
     newPnts <- data.frame(ET = rep(dd$ET[1],length(fineX)), ST = fineX)
     for (i in 1:3) {
       df <- simulateData(tops[[i]],predict=newPnts,typeYP="v")$predict  
       lines(df$ST,df$EY,type="l",lty=i) 
     }

     ## DESKTOP EXAMPLE: This example automatically creates (and fits) the model  
     ## space of the BMC SB 2008 dTTP induced R1 dimerization reference above.
     library(ccems)
     topology <- list(  
         heads=c("R1t0","R2t0"),  
         sites=list(       
             s=list(                     # s-site    thread #
                 m=c("R1t1"),        # monomer      1
                 d=c("R2t1","R2t2")  # dimer        2
             )
         )
     ) 

     g <- mkg(topology,TCC=TRUE) 
     data(RNR)
     d1 <- subset(RNR,(year==2001)&(fg==1)&(G==0)&(t>0),select=c(R,t,m,year))
     d2 <- subset(RNR,year==2006,select=c(R,t,m,year)) 
     dd <- rbind(d1,d2)
     names(dd)[1:2] <- paste(strsplit(g$id,split="")[[1]],"T",sep="")#e.g. to form "RT"
     rownames(dd) <- 1:dim(dd)[1] # lose big number row names of parent dataframe
     ## top10=ems(dd,g,cpusPerHost=c("localhost"=2),maxTotalPs=2,ptype="SOCK") 

     ## CLUSTER EXAMPLE: This ATP induced R1 hexamerization example runs 1.8 days
     ##                  on a 16 core (4 quad proc machines) ROCKS Linux cluster. 

     library(ccems)
     topology <- list(
         heads=c("R1X0","R2X2","R4X4","R6X6"), 
         sites=list(                # s-sites are already filled only in (j>1)-mers 
             a=list(  #a-site                                                    thread
                 m=c("R1X1"),                                            # monomer   1
                 d=c("R2X3","R2X4"),                                     # dimer     2
                 t=c("R4X5","R4X6","R4X7","R4X8"),                       # tetramer  3
                 h=c("R6X7","R6X8","R6X9","R6X10", "R6X11", "R6X12")     # hexamer   4
             ), # tails of a-site threads are heads of h-site threads
             h=list(   # h-site
                 m=c("R1X2"),                                            # monomer   5
                 d=c("R2X5", "R2X6"),                                    # dimer     6
                 t=c("R4X9", "R4X10","R4X11", "R4X12"),                  # tetramer  7
                 h=c("R6X13", "R6X14", "R6X15","R6X16", "R6X17", "R6X18")# hexamer   8
             )
         )
     )
     g=mkg(topology,TCC=TRUE) 
     dd=subset(RNR,(year==2002)&(fg==1)&(X>0),select=c(R,X,m,year))
     names(dd)[1:2] <- paste(strsplit(g$id,split="")[[1]],"T",sep="")#i.e. c("RT","XT")

     ## 29 choose 3(2) is 3654(406), so 3654 + 406 + 29 + 1 = 4090 spurs, but after 
     ## subtracting those without at least one hexamer complex, and after adding 
     ## grids, the total number of models is 3410. Of these 3406 converged, see below. 
     ## Not run: 
     cpusPerHost=c("localhost" = 4,"compute-0-0"=4,"compute-0-1"=4,"compute-0-2"=4)
     top10=ems(dd,g,cpusPerHost=cpusPerHost, maxTotalPs=3, ptype="SOCK",IC=100) 

     # The following are the last few lines of the output. The first line shows that a 
     # one parameter model is best(shown are best AICs of models with 0, 1, 2 or 3  
     # parameters). The next shows that it took 1.8 days on 16 cpus to fit 3406 models. 
     # And the block that follows shows that the top 5 models are all spur graph models.
     # The html file RXglobSOCK.htm in the results directory contains this information 
     # and more (e.g. parameter estimates and CI). 
     #
     # [1] 1000000.00000     -33.16309     -31.73658     -29.99075
     #
     # Time difference of 2623.881 mins
     # Fitted = 3406, out of a total of  3410 
     #
     # ... making HTML file ... 
     #  1 Model  20; nbp= 1; id=IIIIIIIIIIIJIIIIIIIIIIIIIIIII; AIC=-33.1631
     #  2 Model 108; nbp= 2; id=IIIIIJIIIIIJIIIIIIIIIIIIIIIII; AIC=-31.7366
     #  3 Model  21; nbp= 1; id=IIIIIIIIIIIIJIIIIIIIIIIIIIIII; AIC=-31.5144
     #  4 Model 109; nbp= 2; id=IIIIIJIIIIIIJIIIIIIIIIIIIIIII; AIC=-31.4678
     #  5 Model 145; nbp= 2; id=IIIIIIIIJIIIJIIIIIIIIIIIIIIII; AIC=-31.4431
     ## End(Not run)

