ems                  package:ccems                  R Documentation

_E_q_u_i_l_i_b_r_i_u_m _M_o_d_e_l _S_e_l_e_c_t_i_o_n

_D_e_s_c_r_i_p_t_i_o_n:

     This is the main automation function of this package. It generates
     a space of   combinatorially complex equilibrium models and fits
     them to data.

_U_s_a_g_e:

     ems(d, g, cpusPerHost=c("localhost" = 1), ptype="",
               spurChunkSize=1000, nSpurChunks=1,
               maxTotalPs=5,minTotalPs=NULL,extend2maxP=TRUE, 
               smart=FALSE,doTights=FALSE,doGrids=TRUE,
               doSpurs=TRUE,topN=10,showConstr=FALSE,
               atLeastOne=TRUE,atLeastOneOfEach=FALSE,
               KIC=1,kIC=1,fullGrid=FALSE,
             transform=c("boxCox","relResid","none","sqrt","log"),lam=0.5,
             m1=-90,p=-1,forceM1=FALSE,forceP=FALSE)

_A_r_g_u_m_e_n_t_s:

       d: The data as a dataframe.

       g: The list output of 'mkg'. 

cpusPerHost: This is an integer vector where names are host names and
          values are their cpu numbers. 

   ptype: Parallelization type: '""' for single cpus; '"SOCK"' and 
          '"NWS"' (networkspaces)  for 'snow' options.   

spurChunkSize: The 'batchSize' of spur model chunks, see 'mkSpurs'

nSpurChunks: The number of spur model chunks requested  (this may
          increase internally if 'extend2maxP' = 'TRUE' or
          'smart=TRUE').

maxTotalPs: The maximum number of parameters of models that will  be
          fitted (internally, larger models may be generated but not
          fitted).

minTotalPs: The minimum number of parameters of models in the model
          space. If 'NULL' no minimum is imposed. 

extend2maxP: This logical is 'TRUE' if 'nSpurChunks' should be extended
           (if needed) to reach 'maxTotalPs'. 

   smart: Set to 'TRUE' to stop when models with 'lastCompleted'
          parameters (see 'mkSpurs')  have an AIC that is bigger than
          that of the 'lastCompleted-1' parameter models. 

doTights: Set  to 'TRUE' if spur models with infinitely tight binding
          single edges (with K=0) are wanted in the model space.

 doGrids: Leave 'TRUE' (the default) if grid models are wanted, set to
          'FALSE' if not (e.g. if only spur models are wanted). 

 doSpurs: Leave 'TRUE' if the spur model space is wanted, set to
          'FALSE' if not (e.g. if only grid models are wanted). 

    topN: The number of best models of the current batch of models that
          will be carried  over to compete with the next batch; such
          carryovers  are needed to allow fits of model spaces that are
          too large to reside in memory at one time.  This number  is
          also the number of best models summarized in html in the
          'results' folder after fitting each batch.

showConstr: Set to 'TRUE' if constrained (fixed and tracking)
          parameters are to be included in the html report in
          'results'.

atLeastOne: Leave 'TRUE' if only models with at least one complex of
          maximal size are to be considered. Set 'FALSE' if there is no
          prior knowledge supportive of the assertion that the largest
          oligomer must be in the model.

atLeastOneOfEach: Set 'TRUE' if only models with at least one complex 
          of each oligomer size are to be considered. This is useful
          when the data are multivariate proportions (i.e. mass
          distribution data) and each j-mer is clearly present. 

     KIC: The initial condition of all K parameters optimized. The
          default is 'KIC=1' (in uM). 

     kIC: The initial condition of all k parameters optimized. The
          default is 'kIC=1' (in 1/seconds per occupied active site). 

fullGrid: Set 'TRUE' if a full binary K model is wanted, else grids
          that are  equivalent to spurs are eliminated from the model
          space. 

transform: If not '"none"' data and model are transformed before
          forming residuals.  This is used to stabilize enzyme activity
          variances. Other options are '"boxCox"' for Box-Cox
          transformations, in which case 'lam' below is used as lambda,
           '"relResid"' to divide the residuals by the data,  and
          square root and natural log transformations using '"sqrt"'
          and '"log"', respectively.

     lam: The lambda parameter of the Box-Cox transformation, if used. 

      m1: The hub protein's monomer mass in kDa.  The default is 90 for
          the big (R1) subunit of ribonucleotide reductase (RNR). This
          only matters if the data is mass data. Negative numbers imply
          fixed values and positive numbers imply starting values to be
          fitted to the data.

       p: Probability that hub can oligomerize, i.e. is not damaged. 
          Set to a positive value if additional rows are to be added to
          the output dataframe to include models  with 'p' freely
          estimated. Set negative to hold fixed. Value is the initial
          or fixed value.

 forceM1: Set 'TRUE' to force all models to estimate M1, i.e. to not
          generate models with M1 fixed. 

  forceP: Set 'TRUE' to force all models to estimate p, i.e. to not
          generate models with fixed p. 

_D_e_t_a_i_l_s:

     This is the highest level function in 'ccems'. The other functions
     serve this function, though they may also be used to fit
     individual  models manually.

_V_a_l_u_e:

     A list of the 'topN' best (lowest AIC) models. This should be
     assigned to a variable  to avoid large screen dumps.  An html
     report, the topN fitted models, and a brief summary of all fitted
     models, are saved to  'results' and are the main outputs and use
     of this function.

_N_o_t_e:

     Spur and grid graph models have network topologies that either
     radiate  from the hub or can be overlaid on a city block lay out,
     respectively.  Though head node spur graph edges can be
     superimposed in curtain rods (see 'ccems')  to give these graphs a
     grid appearance, curtain rods are really sets  of nested arches.
     Thus curtains could be called spur-grid hybrid K equality graphs
     or simply hybrids (i.e. a term that is more tolerant than grid).
     Another option is to tolerate spur  edges to head nodes in a 
     broadened definition of the term grid. Advantages include an
     emphasis on parallel edges and thus  equality aspects of the graph
     (compared to the term hybrid), more compactness  (compared to the
     term K equality) and usage inertia.  Readers are thus asked to
     accept this broadened definition  of the term grid, i.e. to allow
     head node spur edges in grid graphs. 

     This work was supported by the National Cancer Institute
     (K25CA104791).

_A_u_t_h_o_r(_s):

     Tom Radivoyevitch (txr24@case.edu)

_R_e_f_e_r_e_n_c_e_s:

     Radivoyevitch, T. (2009) Automated model generation and analysis
     methods  for combinatorially complex biochemical equilibriums. (In
     preparation)

_S_e_e _A_l_s_o:

     'ccems', 'mkg'

_E_x_a_m_p_l_e_s:

     library(ccems)
     topology <- list(  
             heads=c("R1t0","R2t0"),  
             sites=list(       
                     s=list(                     # s-site    thread #
                             m=c("R1t1"),        # monomer      1
                             d=c("R2t1","R2t2")  # dimer        2
                     )
             )
     ) 
     g <- mkg(topology) 
     data(RNR)
     d1 <- subset(RNR,(year==2001)&(fg==1)&(G==0)&(t>0),select=c(R,t,m,year))
     d2 <- subset(RNR,year==2006,select=c(R,t,m,year)) 
     dd <- rbind(d1,d2)
     names(dd)[1:2] <- c("RT","tT")
     rownames(dd) <- 1:dim(dd)[1] # lose big number row names of parent dataframe
     # the call above ends sooner if maxTotalPs is reached
     ## Not run: 
      
     top <- ems(dd,g,maxTotalPs=1)  # this takes roughly one minute 
     ## End(Not run)

