proposal
Name : Dylan Knaggs
Tutorial title : Statistical Simulation with R
Credits : 2 or 3
A tutorial to further the student's plan, focusing on demonstrating the statistical validity of their work
through rigorous simulation and
There are three areas where I need to implement simulation studies, and need to get different things from
each of them.
First, there are two types of area sampling methods I am looking at, the quadrat method (drawing blocks and
sampling some number of blocks and extrapolating the data from the sample to get a population estimat)
and T-square sampling (finding the distance between houses and random points to find population density, then
using that and knowledge of the total area to find the population size)
I need simulations to see how they work on populations with different characteristics, primarily different
amounts of clustering
I'm also looking at optimizing the quadrat method, finding how many blocks to make and sample to give the
best balance of precision and work
I'm also looking at comparing the methods and seeing their comparative strengths and weaknesses. This will
probably overlap with the first goal and will likely not be completely done through simulation study.
I have also developed a sampling method called Scaled Estimation Sampling (SES) that I need to verify
through simulation studies. The method works by making estimates of nonoverlapping sectors of the population
and sampling some of them to get the true sector population. The estimates and actuals are compared to get
a scale factor which is then multiplied by all the estimates.
So far I have coded the method and a way to make estimates but need simulation to verify the method.
I am also looking to decrease the necessary sample size but simulation may not be able to accomplish this.
I am also looking at social network sampling, using recruiters from a hidden population to gather a sample.
I am looking at doing simulation studies on two different methods, Respondent Driven Sampling, where you
go from individual to individual in a chain, and Network Sampling with Memory, where the participants nominate
individuals who are picked at random.
I'm not entirely sure how I will apply the simulation to RDS, or even if rigorous simulation is needed,
but for NSM I want to look at the variance and how it is affected by characteristics of the population and
sample and am hoping to do some form of optimization based on that.