Sep 15

This week I inserted functions into my code and put up definitions for variables and terms related to the sampling methodology into my code.

I tried to do some early things for optimization as we talked about briefly last week but wasn't really sure where to start.

Jim says

I don't remember what we meant by optimization - I think we were talking about examples, tests, docs, and all that.

Here is my notes while looking at your code, though I didn't get to all the files.



 Documentation and coding style is much better. 
 But could be better still.

 See http://google-styleguide.googlecode.com/svn/trunk/google-r-style.html
  (They like <- . I still like = ... but there you go.)

 * You're using tabs for indentation, which is generally a bad idea.1
   a paragraph format - sort of like a run-on sentence in written
   work. Two or four spaces. (An editor that understands R 
   such as emacs or one of the R GUIs should do this for you.)

 * The header at the top of the file should have
     * filename
     * one line description
     * longer discussion (optional)
     * instructions and/or example on how to use this file
     * version number (optional if you put a date)
     * author, date, license

 * Function documentation, as described in the google style guide,
   should include
     (a) say what it is
     (b) describe how to use it (i.e. inputs, outputs, side effects)
     (c) discuss implementation details (after (a) and (b))
   The point is to make it easy for someone else to use it.
   Your paragraph has that information, but more in the form
   of a run-on sentence than in a way where the most important
   bits jump out at you. 
 
 * Put a blank line between function definitions.
   And indent comments within a function same as function body.

 * All library() loading should happen outside the function,
   at the top of file, to see the dependencies clearly.
 
 * The last thing you do is calculate some variables.
   But it's not clear what you then intend to do with them,
   or how you will output them. Are these tests? An example? It isn't clear.

   Also, if some of these tasks take time to complete, it's
   a good idea to make it clear to the user that something is going on, e.g.
      print("Finding Foo ...")
      foo = long.calculation(n=1e9)

 * Putting tests and examples somewhere would be great.

---------

 Usage:

 $ r
 > dir()
 [1] "NSMR.r" "RDS.r" "SES_simulation.r" "T-square.r"

 > # -- NSM --
 > source("NSM.r")      # prints out network stuff. Then you wait ...
 > ls.str()             # see what's been defined
 bs.var : function (incidence, sample.index, num.resamples = 500, resample.size = 50)  
 get.NSM.sample : function (incidence, sample.size, panel.size = 20, num.nominees = 10)  
 get.SRS : function (incidence, sample.size)  
 incidence :  num [1:1000, 1:1000] 0 1 0 0 0 0 0 0 0 0 ...
 NSM.props :  num [1:2] 0.48 0.52
 NSM.sample :  int [1:50] 524 661 92 897 438 666 584 120 116 918 ...
 NSM.var :  num 0.012
 rand.props :  num [1:2] 0.52 0.48
 random.sample :  int [1:50] 967 632 647 516 721 545 239 247 631 308 ...
 random.var :  num 0.0134
 sample.props : function (incidence, sample.index)  
 sample.size :  num 50
 two.group.network : function (num.A = 500, num.B = 300, int.vec = c(0.15, 0.2, 0.1))  
 > incidence[490:510, 490:510]
   A A A A A A A A A A A B B B B B B B B B B
 A 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
 A 1 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 1 0 0 0 0
 A 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0
 A 0 0 0 0 1 1 1 0 1 1 0 0 0 0 0 0 0 1 0 0 0
 A 0 0 1 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
 A 0 0 0 1 1 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0
 A 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0
 A 0 0 0 0 0 0 0 0 1 1 0 1 0 0 1 0 0 0 0 0 0
 A 0 1 1 1 0 1 0 1 0 1 0 1 0 0 0 1 0 0 0 0 0
 A 1 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0
 A 0 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
 B 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 1
 B 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1
 B 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 1 1
 B 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0
 B 0 0 1 0 0 0 0 0 1 0 0 0 1 1 0 0 1 0 0 0 1
 B 0 1 0 0 0 0 0 0 0 1 0 1 0 1 0 1 0 1 1 0 0
 B 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 1 0 1 0 1
 B 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 1 0
 B 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 0
 B 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 0 1 0 0 0

 > # -- RDS --
 Warning message:
 In data.row.names(row.names, rowsi, i) :
  some row.names duplicated: 3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,
  18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,
  40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,
  62,63,64,65,66,67,68,69,70,71,72,73 --> row.names NOT used

 > ls.str
 get.RDS.sample : function (incidence, n.seeds = 5, n.waves = 5)  
 list.to.df : function (RDS.list, incidence)  
 n.seeds :  num 4
 n.waves :  num 4
 proportions :  num [1:2] 0.631 0.369
 RDS.df : 'data.frame':	47 obs. of  6 variables:
  $ wave          : int  1 1 1 1 1 1 2 2 2 2 ...
  $ recruitment.id: int  749 404 59 725 396 774 38 422 560 519 ...
  $ recruiter.id  : int  47 117 117 754 754 754 749 404 404 404 ...
  $ type          : chr  "B" "A" "A" "B" ...
  $ recruiter.type: chr  "A" "A" "A" "B" ...
  $ degree        : int  221 195 201 221 186 227 187 197 196 210 ...
 RDS.pop.props : function (RDS.df)  
 RDS.sample : List of 4
  $ :List of 4
  $ :List of 4
  $ :List of 4
  $ :List of 4
 test.network :  num [1:800, 1:800] 0 0 0 0 1 0 0 0 1 0 ...
 two.group.network : function (num.A = 500, num.B = 300, int.vec = c(0.15, 0.2, 0.1))

in Jim's office

Looking at the quadrat.r function, we talked about what it would take to make a "goodness of fit" function, that took the paramaters to be varied and output a number based on variance, sample size, or some combination. The search will look over the inputs to find the "best" output.



# optimize:
#   input: 
#     change number of clusters (in args to get.actual.pops)
#     change number to sample (in args to get.quad.sample)
#   output:
#     sample size
#     variance
#
#
function goodness(cluster.count, sample.count){
  side.length = 10
  sample.win = owin(xrange=c(0,side.length),yrange=c(0,side.length))
  houses = rpoispp(50,win=sample.win)
  actual.pops = get.actual.pops(houses, n.clusters = cluster.count)
  quad.sample = get.quad.sample(actual.pops, n.sampled = sample.count)
  quad.population = quad.estimate(quad.sample)
  quad.sample.size = quad.ss(quad.sample)
  quad.variance = jk.var(quad.sample,quad.population)
  return(quad.variance)
}

Once that's written, there are many types of search:

plot (contour or 3D) the outputs on a grid of possible inputs, look visually for what you want
hill climbing
many other automated searches (simulated annealing etc)



function search(goodness, cluster.count.range, sample.count.range){
  
}

http://cs.marlboro.edu/ courses/ fall2011/jims_tutorials/ dylan/ Sep_15
last modified Friday September 16 2011 11:05 am EDT

Jim's
Tutorials

course

navigation

Sep 15

Jim says

in Jim's office

Jim'sTutorials

course

navigation

Sep 15

Jim says

in Jim's office

Jim's
Tutorials