Re: [math] paper: "Friendship networks and social status" (Brian Ball, M. E. J. Newman)

4 Oct 2012

      You might be able to just skip the MCMC part and do exact
marginalization. Do you want me to try to code that up? I have an
out-of-date version of the code sitting around somewhere.
On Thu, Oct 4, 2012 at 1:50 PM, Michael Bishop michaelbish@gmail.com wrote:
...
Yeah, its the MCMC part... still running for 16 hours now on a year old
computer with 16gb of ram.  Once again this is just one, albeit the largest,
of 84 networks in my dataset.  I'm told the easiest thing that would help is
to take advantage of my second core but I don't know how to do that.
Here's the output:
$ ./highschool_rankings -i paj_files_r/comm93.scc.el -iter30 -v3
First step will be a linearly scored minimum violations ranking.
Ranks initialized via simulated annealing. The score from the annealing
process was -59692
Starting EM iteration 1 of a maximum of 30.
Log-Likelihood at the end of the iteration = 20387.6
0.242066 1.46033e-05 0.000997235 0.0618695 5.06643e-05 -0.0331409 0.0261
432 0.0251279 -0.0177707 -0.0117149
Starting EM iteration 2 of a maximum of 30.
Log-Likelihood at the end of the iteration = 19637.5
0.271515 1.45319e-05 0.000757182 0.0797456 4.72377e-05 0.0309019 -0.0246
118 -0.0258381 0.0155623 0.0114179
Starting EM iteration 3 of a maximum of 30.
Log-Likelihood at the end of the iteration = 19357.8
0.281222 1.43437e-05 0.000693565 0.0887446 4.46714e-05 -0.0349585 0.0142
105 0.0235122 -0.0143165 -0.00574516
Starting EM iteration 4 of a maximum of 30.
Log-Likelihood at the end of the iteration = 19278.5
0.281864 1.42948e-05 0.000692791 0.0931104 4.31624e-05 -0.0341695 0.0147
996 0.0244729 -0.0133326 -0.00491241
Starting EM iteration 5 of a maximum of 30.
Log-Likelihood at the end of the iteration = 19219.5
0.288413 1.35299e-05 0.000712436 0.098817 4.00959e-05 -0.0339019 0.01445
34 0.0246769 -0.0129563 -0.00465449
Starting EM iteration 6 of a maximum of 30.
On Thu, Oct 4, 2012 at 1:38 PM, Eric Purdy epurdy@uchicago.edu wrote:
...
Do you know which part of the code is being slow? If it's the MCMC
part (and I'm guessing it is?), I might be able to give advice on
speeding that up.
On Thu, Oct 4, 2012 at 11:31 AM, Michael Bishop michaelbish@gmail.com
wrote:
...
Thanks Eric.  So, I do have the code the authors used, and it will be
made
public eventually, but they haven't given me permission to make it
public,
and I'm a reticent about sharing it over the list.  If you're interested
in
it, let me know with an email off list, and  we can discuss how you
might
get involved.  Maybe at some point in the not-too-distant future I can
just
share  it on the list.  The code should run on any suitably cleaned up
directed network data, but it has only been run on data from the Add
Health
study.  Its very processor intensive... Last night I started running it
on
one of the largest of 84 networks (844 vertices and approx 4000 edges)
and
its still running.
On Thu, Oct 4, 2012 at 9:59 AM, Eric Purdy epurdy@uchicago.edu wrote:
...
First, some general advice about reading technical papers! Read the
introduction, then the discussion/conclusion, then the experimental
results, and THEN the technical details (in this case, the section
entitled "Inference of rank from network structure"). This is out of
order, obviously, but it's a good triage strategy: you read things in
increasing level of difficulty, so that you always know whether you
care enough to read the next, more difficult part. It's also good
because you have more context to interpret the difficult parts if
you've read all the easier parts of the paper.
I've put a list of wikipedia articles at the bottom of this email,
under the heading "Study Guide". They are a hopefully comprehensible
explanation of the math needed to understand the most technical parts
of the paper. But, if you don't feel like wading through the most
technical parts, you don't have to read the study guide articles in
order to participate in the discussion. (Also, I'm omitting some
background for technical parts that I don't know much about, like
ANOVA and statistical tests of model fit. If someone feels like adding
anything to the study guide, go for it.)
== Paper ==
PDF link: http://arxiv.org/pdf/1205.6822v1.pdf
Arxiv page: http://arxiv.org/abs/1205.6822
== Abstract ==
In empirical studies of friendship networks participants are typically
asked, in interviews or questionnaires, to identify some or all of
their close friends, resulting in a directed network in which
friendships can, and often do, run in only one direction between a
pair of individuals. Here we analyze a large collection of such
networks representing friendships among students at US high and
junior-high schools and show that the pattern of unreciprocated
friendships is far from random. In every network, without exception,
we find that there exists a ranking of participants, from low to high,
such that almost all unreciprocated friendships consist of a
lower-ranked individual claiming friendship with a higher-ranked one.
We present a maximum-likelihood method for deducing such rankings from
observed network data and conjecture that the rankings produced
reflect a measure of social status. We note in particular that
reciprocated and unreciprocated friendships obey different statistics,
suggesting different formation processes, and that rankings are
correlated with other characteristics of the participants that are
traditionally associated with status, such as age and overall
popularity as measured by total number of friends.
== Code ==
Mike Bishop has some C++ code that (I think?) the authors of the paper
are releasing publicly. Mike, can you send that out? It would probably
be a good base to start from if we want to run simulations.
== Study guide ==
These are just to be helpful, don't feel like you have to read any of
them to participate. And feel free to suggest additions, or say that
one of these articles wasn't very helpful.
Adjacency matrix: (http://en.wikipedia.org/wiki/Adjacency_matrix) This
is useful because it gives you a dictionary between graph theory and
linear algebra. It only comes up a little bit, but it's often used
without any explanation.
Elementary Bayesian stuff:
(http://en.wikipedia.org/wiki/Bayesian_statistics) This is a very
high-level article that gives links to more in-depth articles.
Less elementary Bayesian stuff:
(http://en.wikipedia.org/wiki/Bayesian_inference) This is the main
piece of background for the technical section of this paper. If you
really want to know this stuff, Bishop's "Pattern Recognition and
Machine Learning" is the best textbook I know, although it's not
accessible without a lot of mathematical background or a lot of free
time.
Bayesian games: (http://en.wikipedia.org/wiki/Bayesian_game) I've
never seen this before, but it seems like the appropriate math to use
when thinking about social science stuff with Bayesian methods.
EM:
(http://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm)
The workhorse of mathematically rigorous Bayesian statistics. It's a
flaky piece of shit, but it's pretty much all we have.
MCMC: (http://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo) The
workhorse of mathematically rigorous frequentist statistics.
Gibbs sampling: (http://en.wikipedia.org/wiki/Gibbs_sampling) This is
the most mature mathematics that specifies how emergent properties
emerge from local interactions. It was first created to model
thermodynamics, which is basically all about emergent properties of
molecule interactions, like temperature, pressure, etc.
Ising model: (http://en.wikipedia.org/wiki/Ising_model) This is the
most straightforward example of a Gibbs distribution, it's what you
should think about when you try to understand them.
On Wed, Oct 3, 2012 at 10:37 AM, Eric Purdy epurdy@uchicago.edu
wrote:
...
I am starting a new listhost, math@moomers.org. I am hoping that it
will function much like readings, but that we can have long and
intricate technical discussions without annoying anyone.
Some proposed ground rules:

There are no stupid questions, and we're interested in everyone's

opinion. In particular, you are encouraged to subscribe to
math@moomers.org even if you do not have much mathematical training.
If you're interested in watching math being done, that's all that
matters.

There will be no status games, and no arguments from authority.

Nobody should sign up to math@moomers.org because they want to
impress
anyone, and nobody can cite their professional status or mathematical
training to make someone else shut up.

If any publishable research is done on math@moomers.org, we will

decide by consensus whose names should be on any papers that result.
The first proposed paper to read is "Friendship networks and social
status", by Ball and Newman. It is available online at
http://arxiv.org/abs/1205.6822 . I am interested in it because it
gives a nice model of social stratification as an emergent property
of
social networks. I think it would be cool to try to extend it, and I
think it should be possible to give a scientific account of the
phenomenon of discursive privilege.
Here is the abstract:
In empirical studies of friendship networks participants are
typically
asked, in interviews or questionnaires, to identify some or all of
their close friends, resulting in a directed network in which
friendships can, and often do, run in only one direction between a
pair of individuals. Here we analyze a large collection of such
networks representing friendships among students at US high and
junior-high schools and show that the pattern of unreciprocated
friendships is far from random. In every network, without exception,
we find that there exists a ranking of participants, from low to
high,
such that almost all unreciprocated friendships consist of a
lower-ranked individual claiming friendship with a higher-ranked one.
We present a maximum-likelihood method for deducing such rankings
from
observed network data and conjecture that the rankings produced
reflect a measure of social status. We note in particular that
reciprocated and unreciprocated friendships obey different
statistics,
suggesting different formation processes, and that rankings are
correlated with other characteristics of the participants that are
traditionally associated with status, such as age and overall
popularity as measured by total number of friends.
--
-Eric
--
-Eric

Math mailing list
Math@moomers.org
http://mailman.moomers.org/mailman/listinfo/math
--
-Eric
-- 
-Eric