Hi all,

I'm Kevin. I met Eric when he came to visit Trevor at Knewton. Trevor invited me along to this list.

Anyway, the discussion is interesting, and I would point out that the phenomenon observed by this paper is very well-known. See, for instance, this Kaggle competition and the ensuing discussion: http://www.kaggle.com/c/FacebookRecruiting

One interesting question beyond this paper and in the direction of where the discussion has been going would be to ask the same questions in larger and larger social networks. Some example questions:
Each of these would be a paper, but you'd probably have to vastly improve the code. I think data sets exist to answer all three.

Kevin

On Wed, Oct 10, 2012 at 1:02 PM, Eric Purdy <epurdy@uchicago.edu> wrote:
Here is a writeup of some notes related to the tricky algorithms
question I was trying to pose. Hopefully it will be helpful to anyone
who is thinking about reimplementing the code from the paper.

On Sun, Oct 7, 2012 at 2:27 PM, Eric Purdy <epurdy@uchicago.edu> wrote:
> GAH, left out a minus sign:
>
> beta = e^[ - (j-i)^2 ]
>
> On Sun, Oct 7, 2012 at 2:26 PM, Eric Purdy <epurdy@uchicago.edu> wrote:
>> Or you can also skip step 2, if you initialize alpha and beta to be
>> something simple and reasonable, like
>>
>> (unreciprocated probability of edge from #i to #j)
>> alpha = e^[ j-i ]
>>
>> (reciprocated probability of edge between #i and #j)
>> beta = e^[ (j-i)^2 ]
>>
>> (I might have alpha and beta backwards from how they are in the paper?)
>>
>> Before I was saying that alpha and beta should be modeled as
>> polynomials, but it probably makes more sense to model (log alpha) and
>> (log beta) as polynomials, as the equations above do. That makes it
>> more like logistic regression, which I think is supposed to work well
>> in the social sciences?
>>
>> On Sun, Oct 7, 2012 at 12:35 PM, Eric Purdy <epurdy@uchicago.edu> wrote:
>>> So, I'm still a bit too busy to just reimplement their code, but
>>> here's a sketch of roughly how I would do it. Maybe someone else will
>>> feel like writing this code? It sounds like we could probably use it
>>> to write a publishable paper, if that's a motivation.
>>>
>>> The easiest would be to do this in python with networkx.
>>>
>>> 1. Write code to separate the edges into training and test sets.
>>> Basically, independently label each edge "test" with probability 10%
>>> and "training" with probability 90%. Then make a new graph that
>>> doesn't have any of the test edges. This is the graph that you will
>>> run EM on.
>>>
>>> 2. Get a ranking that has the minimum number of rank violations.
>>>
>>> 3. Write code to infer alpha and beta from a set of rankings. This is
>>> basically just a least squares problem, if you take my advice and use
>>> polynomials to represent alpha and beta. You will probably want to
>>> have some sort of penalty for having large coefficients in your
>>> polynomial. A common way is to put a prior on the polynomial a_0 + a_1
>>> x + ... + a_k x^k. A sensible prior would be a_1^2 + ... + a_k^2. Note
>>> that you probably want to let a_0 be whatever it wants.
>>>
>>> 4. Write code to sample from the set of all rankings, assuming that
>>> alpha and beta are the correct distributions. This is the thing that
>>> can probably be done quickly if you do some clever algorithms.
>>> Basically, you generate a ranking of the first k nodes, then you
>>> extend it to a ranking of the first (k+1) nodes by inserting node
>>> #(k+1) somewhere in between two other nodes in the ranking.
>>>
>>> 5. Write code that does EM iterations. This is just going back and
>>> forth between steps 3 and 4. In step 4, generate a bunch of random
>>> rankings, and then use all of them when you go back to step 3. Save
>>> all of the rankings, like Mike says.
>>>
>>> 6. Write code that evaluates your proposed ranking by looking at the
>>> invisible test edges. See how the quality of the ranking changes over
>>> each EM iteration.
>>>
>>> On Sun, Oct 7, 2012 at 12:16 PM, Michael Bishop <michaelbish@gmail.com> wrote:
>>>> So with all due respect to the author, its annoying that the only ranking
>>>> output the program saves is the final ranking.  What I've started doing is
>>>> telling it to run with 0 iterations which outputs the  minimum violations
>>>> ranking.  Then I change the output filename, and run it again with a flag
>>>> which allows you to specify a seed.
>>>>
>>>> This way I'll be able to compare the minimum violations rankings to the
>>>> expectation maximization algorithm which typically takes 4-8 iterations to
>>>> converge.  But given the computational requirements, it would seem to make
>>>> sense to default to save every single ranking along the way in case you ever
>>>> want to look at it later.
>>>>
>>>>
>>>>
>>>> On Sun, Oct 7, 2012 at 10:33 AM, Eric Purdy <epurdy@uchicago.edu> wrote:
>>>>>
>>>>> > Let me also raise a substantive issue.  Within groups there is more
>>>>> > agreement about what is status-worthy than between groups.  While I
>>>>> > think
>>>>> > the idea that there exists a status ranking across all the students in a
>>>>> > school has enough truth to it to merit the work being done on it, this
>>>>> > approach is also obscuring the large differences in what earns you
>>>>> > status in
>>>>> > different contexts.  A focus on the causes and consequences of status
>>>>> > differences (status as defined in this paper) inevitably ignores the
>>>>> > importance of groups.  Of course, there are ways of trying to address
>>>>> > the
>>>>> > existence of groups that complement the work done here, and maybe its
>>>>> > not
>>>>> > even as hard as I'm making it out to be, but you always have to draw the
>>>>> > line somewhere, and at that point you start waving your hands.  I guess
>>>>> > I'm
>>>>> > just saying we have to do our best to be honest with ourselves and our
>>>>> > audience about the many things that might be important but we don't know
>>>>> > for
>>>>> > sure because we can't fit them in our model.
>>>>>
>>>>> You can model people playing different status games at the same time.
>>>>> Like, say for simplicity that there are two games in high school,
>>>>> being smart and being cool. Then we could try to figure out which game
>>>>> each player is playing (or how much they're playing both) based on
>>>>> that. It would make the overfitting problems much worse, but it's not
>>>>> super-hard to formulate such a model.
>>>>>
>>>>> Also, you could probably get some sort of independent check of such
>>>>> models by giving people surveys that try to determine what strategy
>>>>> people think they are following. Like, "how cool do you think you are,
>>>>> relative to your friends", "how do you decide whether you want to be
>>>>> friends with a new acquaintance", etc.
>>>>>
>>>>> --
>>>>> -Eric
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> -Eric
>>
>>
>>
>> --
>> -Eric
>
>
>
> --
> -Eric



--
-Eric

_______________________________________________
Math mailing list
Math@moomers.org
http://mailman.moomers.org/mailman/listinfo/math