Sep 15
This week I inserted functions into my code and put up definitions for variables and terms related to the sampling methodology into my code.
I tried to do some early things for optimization as we talked about briefly last week but wasn't really sure where to start.
Jim says
I don't remember what we meant by optimization - I think we were talking about examples, tests, docs, and all that.
Here is my notes while looking at your code, though I didn't get to all the files.
Documentation and coding style is much better.
But could be better still.
See http://google-styleguide.googlecode.com/svn/trunk/google-r-style.html
(They like <- . I still like = ... but there you go.)
* You're using tabs for indentation, which is generally a bad idea.1
a paragraph format - sort of like a run-on sentence in written
work. Two or four spaces. (An editor that understands R
such as emacs or one of the R GUIs should do this for you.)
* The header at the top of the file should have
* filename
* one line description
* longer discussion (optional)
* instructions and/or example on how to use this file
* version number (optional if you put a date)
* author, date, license
* Function documentation, as described in the google style guide,
should include
(a) say what it is
(b) describe how to use it (i.e. inputs, outputs, side effects)
(c) discuss implementation details (after (a) and (b))
The point is to make it easy for someone else to use it.
Your paragraph has that information, but more in the form
of a run-on sentence than in a way where the most important
bits jump out at you.
* Put a blank line between function definitions.
And indent comments within a function same as function body.
* All library() loading should happen outside the function,
at the top of file, to see the dependencies clearly.
* The last thing you do is calculate some variables.
But it's not clear what you then intend to do with them,
or how you will output them. Are these tests? An example? It isn't clear.
Also, if some of these tasks take time to complete, it's
a good idea to make it clear to the user that something is going on, e.g.
print("Finding Foo ...")
foo = long.calculation(n=1e9)
* Putting tests and examples somewhere would be great.
---------
Usage:
$ r
> dir()
[1] "NSMR.r" "RDS.r" "SES_simulation.r" "T-square.r"
> # -- NSM --
> source("NSM.r") # prints out network stuff. Then you wait ...
> ls.str() # see what's been defined
bs.var : function (incidence, sample.index, num.resamples = 500, resample.size = 50)
get.NSM.sample : function (incidence, sample.size, panel.size = 20, num.nominees = 10)
get.SRS : function (incidence, sample.size)
incidence : num [1:1000, 1:1000] 0 1 0 0 0 0 0 0 0 0 ...
NSM.props : num [1:2] 0.48 0.52
NSM.sample : int [1:50] 524 661 92 897 438 666 584 120 116 918 ...
NSM.var : num 0.012
rand.props : num [1:2] 0.52 0.48
random.sample : int [1:50] 967 632 647 516 721 545 239 247 631 308 ...
random.var : num 0.0134
sample.props : function (incidence, sample.index)
sample.size : num 50
two.group.network : function (num.A = 500, num.B = 300, int.vec = c(0.15, 0.2, 0.1))
> incidence[490:510, 490:510]
A A A A A A A A A A A B B B B B B B B B B
A 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
A 1 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 1 0 0 0 0
A 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0
A 0 0 0 0 1 1 1 0 1 1 0 0 0 0 0 0 0 1 0 0 0
A 0 0 1 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
A 0 0 0 1 1 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0
A 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0
A 0 0 0 0 0 0 0 0 1 1 0 1 0 0 1 0 0 0 0 0 0
A 0 1 1 1 0 1 0 1 0 1 0 1 0 0 0 1 0 0 0 0 0
A 1 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0
A 0 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
B 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 1
B 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1
B 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 1 1
B 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0
B 0 0 1 0 0 0 0 0 1 0 0 0 1 1 0 0 1 0 0 0 1
B 0 1 0 0 0 0 0 0 0 1 0 1 0 1 0 1 0 1 1 0 0
B 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 1 0 1 0 1
B 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 1 0
B 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 0
B 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 0 1 0 0 0
> # -- RDS --
Warning message:
In data.row.names(row.names, rowsi, i) :
some row.names duplicated: 3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,
18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,
40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,
62,63,64,65,66,67,68,69,70,71,72,73 --> row.names NOT used
> ls.str
get.RDS.sample : function (incidence, n.seeds = 5, n.waves = 5)
list.to.df : function (RDS.list, incidence)
n.seeds : num 4
n.waves : num 4
proportions : num [1:2] 0.631 0.369
RDS.df : 'data.frame': 47 obs. of 6 variables:
$ wave : int 1 1 1 1 1 1 2 2 2 2 ...
$ recruitment.id: int 749 404 59 725 396 774 38 422 560 519 ...
$ recruiter.id : int 47 117 117 754 754 754 749 404 404 404 ...
$ type : chr "B" "A" "A" "B" ...
$ recruiter.type: chr "A" "A" "A" "B" ...
$ degree : int 221 195 201 221 186 227 187 197 196 210 ...
RDS.pop.props : function (RDS.df)
RDS.sample : List of 4
$ :List of 4
$ :List of 4
$ :List of 4
$ :List of 4
test.network : num [1:800, 1:800] 0 0 0 0 1 0 0 0 1 0 ...
two.group.network : function (num.A = 500, num.B = 300, int.vec = c(0.15, 0.2, 0.1))
in Jim's office
Looking at the quadrat.r function, we talked about
what it would take to make a "goodness of fit"
function, that took the paramaters to be varied
and output a number based on variance, sample size,
or some combination. The search will look over
the inputs to find the "best" output.
# optimize:
# input:
# change number of clusters (in args to get.actual.pops)
# change number to sample (in args to get.quad.sample)
# output:
# sample size
# variance
#
#
function goodness(cluster.count, sample.count){
side.length = 10
sample.win = owin(xrange=c(0,side.length),yrange=c(0,side.length))
houses = rpoispp(50,win=sample.win)
actual.pops = get.actual.pops(houses, n.clusters = cluster.count)
quad.sample = get.quad.sample(actual.pops, n.sampled = sample.count)
quad.population = quad.estimate(quad.sample)
quad.sample.size = quad.ss(quad.sample)
quad.variance = jk.var(quad.sample,quad.population)
return(quad.variance)
}
Once that's written, there are many types of search:
- plot (contour or 3D) the outputs on a grid of possible inputs, look visually for what you want
- hill climbing
- many other automated searches (simulated annealing etc)
function search(goodness, cluster.count.range, sample.count.range){
}