Eric, I'm very grateful for the offer.  I'm also a bit uncomfortable asking you to do that.  I also don't know the benefits, and might not be able to use it without help.



On Thu, Oct 4, 2012 at 2:01 PM, Eric Purdy <epurdy@uchicago.edu> wrote:
You might be able to just skip the MCMC part and do exact
marginalization. Do you want me to try to code that up? I have an
out-of-date version of the code sitting around somewhere.

On Thu, Oct 4, 2012 at 1:50 PM, Michael Bishop <michaelbish@gmail.com> wrote:
> Yeah, its the MCMC part... still running for 16 hours now on a year old
> computer with 16gb of ram.  Once again this is just one, albeit the largest,
> of 84 networks in my dataset.  I'm told the easiest thing that would help is
> to take advantage of my second core but I don't know how to do that.
>
> Here's the output:
> $ ./highschool_rankings -i paj_files_r/comm93.scc.el -iter30 -v3
> First step will be a linearly scored minimum violations ranking.
> Ranks initialized via simulated annealing. The score from the annealing
> process was -59692
> Starting EM iteration 1 of a maximum of 30.
> Log-Likelihood at the end of the iteration = 20387.6
> 0.242066 1.46033e-05 0.000997235 0.0618695 5.06643e-05 -0.0331409 0.0261
> 432 0.0251279 -0.0177707 -0.0117149
> Starting EM iteration 2 of a maximum of 30.
> Log-Likelihood at the end of the iteration = 19637.5
> 0.271515 1.45319e-05 0.000757182 0.0797456 4.72377e-05 0.0309019 -0.0246
> 118 -0.0258381 0.0155623 0.0114179
> Starting EM iteration 3 of a maximum of 30.
> Log-Likelihood at the end of the iteration = 19357.8
> 0.281222 1.43437e-05 0.000693565 0.0887446 4.46714e-05 -0.0349585 0.0142
> 105 0.0235122 -0.0143165 -0.00574516
> Starting EM iteration 4 of a maximum of 30.
> Log-Likelihood at the end of the iteration = 19278.5
> 0.281864 1.42948e-05 0.000692791 0.0931104 4.31624e-05 -0.0341695 0.0147
> 996 0.0244729 -0.0133326 -0.00491241
> Starting EM iteration 5 of a maximum of 30.
> Log-Likelihood at the end of the iteration = 19219.5
> 0.288413 1.35299e-05 0.000712436 0.098817 4.00959e-05 -0.0339019 0.01445
> 34 0.0246769 -0.0129563 -0.00465449
> Starting EM iteration 6 of a maximum of 30.
>
>
> On Thu, Oct 4, 2012 at 1:38 PM, Eric Purdy <epurdy@uchicago.edu> wrote:
>>
>> Do you know which part of the code is being slow? If it's the MCMC
>> part (and I'm guessing it is?), I might be able to give advice on
>> speeding that up.
>>
>> On Thu, Oct 4, 2012 at 11:31 AM, Michael Bishop <michaelbish@gmail.com>
>> wrote:
>> > Thanks Eric.  So, I do have the code the authors used, and it will be
>> > made
>> > public eventually, but they haven't given me permission to make it
>> > public,
>> > and I'm a reticent about sharing it over the list.  If you're interested
>> > in
>> > it, let me know with an email off list, and  we can discuss how you
>> > might
>> > get involved.  Maybe at some point in the not-too-distant future I can
>> > just
>> > share  it on the list.  The code should run on any suitably cleaned up
>> > directed network data, but it has only been run on data from the Add
>> > Health
>> > study.  Its very processor intensive... Last night I started running it
>> > on
>> > one of the largest of 84 networks (844 vertices and approx 4000 edges)
>> > and
>> > its still running.
>> >
>> >
>> >
>> > On Thu, Oct 4, 2012 at 9:59 AM, Eric Purdy <epurdy@uchicago.edu> wrote:
>> >>
>> >> First, some general advice about reading technical papers! Read the
>> >> introduction, then the discussion/conclusion, then the experimental
>> >> results, and THEN the technical details (in this case, the section
>> >> entitled "Inference of rank from network structure"). This is out of
>> >> order, obviously, but it's a good triage strategy: you read things in
>> >> increasing level of difficulty, so that you always know whether you
>> >> care enough to read the next, more difficult part. It's also good
>> >> because you have more context to interpret the difficult parts if
>> >> you've read all the easier parts of the paper.
>> >>
>> >> I've put a list of wikipedia articles at the bottom of this email,
>> >> under the heading "Study Guide". They are a hopefully comprehensible
>> >> explanation of the math needed to understand the most technical parts
>> >> of the paper. But, if you don't feel like wading through the most
>> >> technical parts, you don't have to read the study guide articles in
>> >> order to participate in the discussion. (Also, I'm omitting some
>> >> background for technical parts that I don't know much about, like
>> >> ANOVA and statistical tests of model fit. If someone feels like adding
>> >> anything to the study guide, go for it.)
>> >>
>> >> == Paper ==
>> >>
>> >> PDF link: http://arxiv.org/pdf/1205.6822v1.pdf
>> >> Arxiv page: http://arxiv.org/abs/1205.6822
>> >>
>> >> == Abstract ==
>> >>
>> >> In empirical studies of friendship networks participants are typically
>> >> asked, in interviews or questionnaires, to identify some or all of
>> >> their close friends, resulting in a directed network in which
>> >> friendships can, and often do, run in only one direction between a
>> >> pair of individuals. Here we analyze a large collection of such
>> >> networks representing friendships among students at US high and
>> >> junior-high schools and show that the pattern of unreciprocated
>> >> friendships is far from random. In every network, without exception,
>> >> we find that there exists a ranking of participants, from low to high,
>> >> such that almost all unreciprocated friendships consist of a
>> >> lower-ranked individual claiming friendship with a higher-ranked one.
>> >> We present a maximum-likelihood method for deducing such rankings from
>> >> observed network data and conjecture that the rankings produced
>> >> reflect a measure of social status. We note in particular that
>> >> reciprocated and unreciprocated friendships obey different statistics,
>> >> suggesting different formation processes, and that rankings are
>> >> correlated with other characteristics of the participants that are
>> >> traditionally associated with status, such as age and overall
>> >> popularity as measured by total number of friends.
>> >>
>> >> == Code ==
>> >>
>> >> Mike Bishop has some C++ code that (I think?) the authors of the paper
>> >> are releasing publicly. Mike, can you send that out? It would probably
>> >> be a good base to start from if we want to run simulations.
>> >>
>> >> == Study guide ==
>> >>
>> >> These are just to be helpful, don't feel like you have to read any of
>> >> them to participate. And feel free to suggest additions, or say that
>> >> one of these articles wasn't very helpful.
>> >>
>> >> Adjacency matrix: (http://en.wikipedia.org/wiki/Adjacency_matrix) This
>> >> is useful because it gives you a dictionary between graph theory and
>> >> linear algebra. It only comes up a little bit, but it's often used
>> >> without any explanation.
>> >>
>> >> Elementary Bayesian stuff:
>> >> (http://en.wikipedia.org/wiki/Bayesian_statistics) This is a very
>> >> high-level article that gives links to more in-depth articles.
>> >>
>> >> Less elementary Bayesian stuff:
>> >> (http://en.wikipedia.org/wiki/Bayesian_inference) This is the main
>> >> piece of background for the technical section of this paper. If you
>> >> really want to know this stuff, Bishop's "Pattern Recognition and
>> >> Machine Learning" is the best textbook I know, although it's not
>> >> accessible without a lot of mathematical background or a lot of free
>> >> time.
>> >>
>> >> Bayesian games: (http://en.wikipedia.org/wiki/Bayesian_game) I've
>> >> never seen this before, but it seems like the appropriate math to use
>> >> when thinking about social science stuff with Bayesian methods.
>> >>
>> >> EM:
>> >>
>> >> (http://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm)
>> >> The workhorse of mathematically rigorous Bayesian statistics. It's a
>> >> flaky piece of shit, but it's pretty much all we have.
>> >>
>> >> MCMC: (http://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo) The
>> >> workhorse of mathematically rigorous frequentist statistics.
>> >>
>> >> Gibbs sampling: (http://en.wikipedia.org/wiki/Gibbs_sampling) This is
>> >> the most mature mathematics that specifies how emergent properties
>> >> emerge from local interactions. It was first created to model
>> >> thermodynamics, which is basically all about emergent properties of
>> >> molecule interactions, like temperature, pressure, etc.
>> >>
>> >> Ising model: (http://en.wikipedia.org/wiki/Ising_model) This is the
>> >> most straightforward example of a Gibbs distribution, it's what you
>> >> should think about when you try to understand them.
>> >>
>> >> On Wed, Oct 3, 2012 at 10:37 AM, Eric Purdy <epurdy@uchicago.edu>
>> >> wrote:
>> >> > I am starting a new listhost, math@moomers.org. I am hoping that it
>> >> > will function much like readings, but that we can have long and
>> >> > intricate technical discussions without annoying anyone.
>> >> >
>> >> > Some proposed ground rules:
>> >> >
>> >> > - There are no stupid questions, and we're interested in everyone's
>> >> > opinion. In particular, you are encouraged to subscribe to
>> >> > math@moomers.org even if you do not have much mathematical training.
>> >> > If you're interested in watching math being done, that's all that
>> >> > matters.
>> >> >
>> >> > - There will be no status games, and no arguments from authority.
>> >> > Nobody should sign up to math@moomers.org because they want to
>> >> > impress
>> >> > anyone, and nobody can cite their professional status or mathematical
>> >> > training to make someone else shut up.
>> >> >
>> >> > - If any publishable research is done on math@moomers.org, we will
>> >> > decide by consensus whose names should be on any papers that result.
>> >> >
>> >> > The first proposed paper to read is "Friendship networks and social
>> >> > status", by Ball and Newman. It is available online at
>> >> > http://arxiv.org/abs/1205.6822 . I am interested in it because it
>> >> > gives a nice model of social stratification as an emergent property
>> >> > of
>> >> > social networks. I think it would be cool to try to extend it, and I
>> >> > think it should be possible to give a scientific account of the
>> >> > phenomenon of discursive privilege.
>> >> >
>> >> > Here is the abstract:
>> >> >
>> >> > In empirical studies of friendship networks participants are
>> >> > typically
>> >> > asked, in interviews or questionnaires, to identify some or all of
>> >> > their close friends, resulting in a directed network in which
>> >> > friendships can, and often do, run in only one direction between a
>> >> > pair of individuals. Here we analyze a large collection of such
>> >> > networks representing friendships among students at US high and
>> >> > junior-high schools and show that the pattern of unreciprocated
>> >> > friendships is far from random. In every network, without exception,
>> >> > we find that there exists a ranking of participants, from low to
>> >> > high,
>> >> > such that almost all unreciprocated friendships consist of a
>> >> > lower-ranked individual claiming friendship with a higher-ranked one.
>> >> > We present a maximum-likelihood method for deducing such rankings
>> >> > from
>> >> > observed network data and conjecture that the rankings produced
>> >> > reflect a measure of social status. We note in particular that
>> >> > reciprocated and unreciprocated friendships obey different
>> >> > statistics,
>> >> > suggesting different formation processes, and that rankings are
>> >> > correlated with other characteristics of the participants that are
>> >> > traditionally associated with status, such as age and overall
>> >> > popularity as measured by total number of friends.
>> >> >
>> >> > --
>> >> > -Eric
>> >>
>> >>
>> >>
>> >> --
>> >> -Eric
>> >>
>> >> _______________________________________________
>> >> Math mailing list
>> >> Math@moomers.org
>> >> http://mailman.moomers.org/mailman/listinfo/math
>> >>
>> >
>>
>>
>>
>> --
>> -Eric
>
>



--
-Eric