Jim's
Tutorials

Fall 2011
course
navigation

Sep 15

This week I inserted functions into my code and put up definitions for variables and terms related to the sampling methodology into my code.
I tried to do some early things for optimization as we talked about briefly last week but wasn't really sure where to start.

Jim says

I don't remember what we meant by optimization - I think we were talking about examples, tests, docs, and all that.
Here is my notes while looking at your code, though I didn't get to all the files.
Documentation and coding style is much better. But could be better still. See http://google-styleguide.googlecode.com/svn/trunk/google-r-style.html (They like <- . I still like = ... but there you go.) * You're using tabs for indentation, which is generally a bad idea.1 a paragraph format - sort of like a run-on sentence in written work. Two or four spaces. (An editor that understands R such as emacs or one of the R GUIs should do this for you.) * The header at the top of the file should have * filename * one line description * longer discussion (optional) * instructions and/or example on how to use this file * version number (optional if you put a date) * author, date, license * Function documentation, as described in the google style guide, should include (a) say what it is (b) describe how to use it (i.e. inputs, outputs, side effects) (c) discuss implementation details (after (a) and (b)) The point is to make it easy for someone else to use it. Your paragraph has that information, but more in the form of a run-on sentence than in a way where the most important bits jump out at you. * Put a blank line between function definitions. And indent comments within a function same as function body. * All library() loading should happen outside the function, at the top of file, to see the dependencies clearly. * The last thing you do is calculate some variables. But it's not clear what you then intend to do with them, or how you will output them. Are these tests? An example? It isn't clear. Also, if some of these tasks take time to complete, it's a good idea to make it clear to the user that something is going on, e.g. print("Finding Foo ...") foo = long.calculation(n=1e9) * Putting tests and examples somewhere would be great. --------- Usage: $ r > dir() [1] "NSMR.r" "RDS.r" "SES_simulation.r" "T-square.r" > # -- NSM -- > source("NSM.r") # prints out network stuff. Then you wait ... > ls.str() # see what's been defined bs.var : function (incidence, sample.index, num.resamples = 500, resample.size = 50) get.NSM.sample : function (incidence, sample.size, panel.size = 20, num.nominees = 10) get.SRS : function (incidence, sample.size) incidence : num [1:1000, 1:1000] 0 1 0 0 0 0 0 0 0 0 ... NSM.props : num [1:2] 0.48 0.52 NSM.sample : int [1:50] 524 661 92 897 438 666 584 120 116 918 ... NSM.var : num 0.012 rand.props : num [1:2] 0.52 0.48 random.sample : int [1:50] 967 632 647 516 721 545 239 247 631 308 ... random.var : num 0.0134 sample.props : function (incidence, sample.index) sample.size : num 50 two.group.network : function (num.A = 500, num.B = 300, int.vec = c(0.15, 0.2, 0.1)) > incidence[490:510, 490:510] A A A A A A A A A A A B B B B B B B B B B A 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 A 1 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 1 0 0 0 0 A 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 A 0 0 0 0 1 1 1 0 1 1 0 0 0 0 0 0 0 1 0 0 0 A 0 0 1 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 A 0 0 0 1 1 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 A 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 A 0 0 0 0 0 0 0 0 1 1 0 1 0 0 1 0 0 0 0 0 0 A 0 1 1 1 0 1 0 1 0 1 0 1 0 0 0 1 0 0 0 0 0 A 1 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 A 0 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 B 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 1 B 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 B 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 1 1 B 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 B 0 0 1 0 0 0 0 0 1 0 0 0 1 1 0 0 1 0 0 0 1 B 0 1 0 0 0 0 0 0 0 1 0 1 0 1 0 1 0 1 1 0 0 B 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 1 0 1 0 1 B 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 1 0 B 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 0 B 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 0 1 0 0 0 > # -- RDS -- Warning message: In data.row.names(row.names, rowsi, i) : some row.names duplicated: 3,4,5,6,7,8,9,10,11,12,13,14,15,16,17, 18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39, 40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61, 62,63,64,65,66,67,68,69,70,71,72,73 --> row.names NOT used > ls.str get.RDS.sample : function (incidence, n.seeds = 5, n.waves = 5) list.to.df : function (RDS.list, incidence) n.seeds : num 4 n.waves : num 4 proportions : num [1:2] 0.631 0.369 RDS.df : 'data.frame': 47 obs. of 6 variables: $ wave : int 1 1 1 1 1 1 2 2 2 2 ... $ recruitment.id: int 749 404 59 725 396 774 38 422 560 519 ... $ recruiter.id : int 47 117 117 754 754 754 749 404 404 404 ... $ type : chr "B" "A" "A" "B" ... $ recruiter.type: chr "A" "A" "A" "B" ... $ degree : int 221 195 201 221 186 227 187 197 196 210 ... RDS.pop.props : function (RDS.df) RDS.sample : List of 4 $ :List of 4 $ :List of 4 $ :List of 4 $ :List of 4 test.network : num [1:800, 1:800] 0 0 0 0 1 0 0 0 1 0 ... two.group.network : function (num.A = 500, num.B = 300, int.vec = c(0.15, 0.2, 0.1))

in Jim's office

Looking at the quadrat.r function, we talked about what it would take to make a "goodness of fit" function, that took the paramaters to be varied and output a number based on variance, sample size, or some combination. The search will look over the inputs to find the "best" output.
# optimize: # input: # change number of clusters (in args to get.actual.pops) # change number to sample (in args to get.quad.sample) # output: # sample size # variance # # function goodness(cluster.count, sample.count){ side.length = 10 sample.win = owin(xrange=c(0,side.length),yrange=c(0,side.length)) houses = rpoispp(50,win=sample.win) actual.pops = get.actual.pops(houses, n.clusters = cluster.count) quad.sample = get.quad.sample(actual.pops, n.sampled = sample.count) quad.population = quad.estimate(quad.sample) quad.sample.size = quad.ss(quad.sample) quad.variance = jk.var(quad.sample,quad.population) return(quad.variance) }
Once that's written, there are many types of search:
function search(goodness, cluster.count.range, sample.count.range){ }
http://cs.marlboro.edu/ courses/ fall2011/jims_tutorials/ dylan/ Sep_15
last modified Friday September 16 2011 11:05 am EDT