Hi all,
Here is a writeup of some notes related to the tricky algorithms
question I was trying to pose. Hopefully it will be helpful to anyone
who is thinking about reimplementing the code from the paper.
On Sun, Oct 7, 2012 at 2:27 PM, Eric Purdy <epurdy@uchicago.edu> wrote:
> GAH, left out a minus sign:
>
> beta = e^[ - (j-i)^2 ]
>
> On Sun, Oct 7, 2012 at 2:26 PM, Eric Purdy <epurdy@uchicago.edu> wrote:
>> Or you can also skip step 2, if you initialize alpha and beta to be
>> something simple and reasonable, like
>>
>> (unreciprocated probability of edge from #i to #j)
>> alpha = e^[ j-i ]
>>
>> (reciprocated probability of edge between #i and #j)
>> beta = e^[ (j-i)^2 ]
>>
>> (I might have alpha and beta backwards from how they are in the paper?)
>>
>> Before I was saying that alpha and beta should be modeled as
>> polynomials, but it probably makes more sense to model (log alpha) and
>> (log beta) as polynomials, as the equations above do. That makes it
>> more like logistic regression, which I think is supposed to work well
>> in the social sciences?
>>
>> On Sun, Oct 7, 2012 at 12:35 PM, Eric Purdy <epurdy@uchicago.edu> wrote:
>>> So, I'm still a bit too busy to just reimplement their code, but
>>> here's a sketch of roughly how I would do it. Maybe someone else will
>>> feel like writing this code? It sounds like we could probably use it
>>> to write a publishable paper, if that's a motivation.
>>>
>>> The easiest would be to do this in python with networkx.
>>>
>>> 1. Write code to separate the edges into training and test sets.
>>> Basically, independently label each edge "test" with probability 10%
>>> and "training" with probability 90%. Then make a new graph that
>>> doesn't have any of the test edges. This is the graph that you will
>>> run EM on.
>>>
>>> 2. Get a ranking that has the minimum number of rank violations.
>>>
>>> 3. Write code to infer alpha and beta from a set of rankings. This is
>>> basically just a least squares problem, if you take my advice and use
>>> polynomials to represent alpha and beta. You will probably want to
>>> have some sort of penalty for having large coefficients in your
>>> polynomial. A common way is to put a prior on the polynomial a_0 + a_1
>>> x + ... + a_k x^k. A sensible prior would be a_1^2 + ... + a_k^2. Note
>>> that you probably want to let a_0 be whatever it wants.
>>>
>>> 4. Write code to sample from the set of all rankings, assuming that
>>> alpha and beta are the correct distributions. This is the thing that
>>> can probably be done quickly if you do some clever algorithms.
>>> Basically, you generate a ranking of the first k nodes, then you
>>> extend it to a ranking of the first (k+1) nodes by inserting node
>>> #(k+1) somewhere in between two other nodes in the ranking.
>>>
>>> 5. Write code that does EM iterations. This is just going back and
>>> forth between steps 3 and 4. In step 4, generate a bunch of random
>>> rankings, and then use all of them when you go back to step 3. Save
>>> all of the rankings, like Mike says.
>>>
>>> 6. Write code that evaluates your proposed ranking by looking at the
>>> invisible test edges. See how the quality of the ranking changes over
>>> each EM iteration.
>>>
>>> On Sun, Oct 7, 2012 at 12:16 PM, Michael Bishop <michaelbish@gmail.com> wrote:
>>>> So with all due respect to the author, its annoying that the only ranking
>>>> output the program saves is the final ranking. What I've started doing is
>>>> telling it to run with 0 iterations which outputs the minimum violations
>>>> ranking. Then I change the output filename, and run it again with a flag
>>>> which allows you to specify a seed.
>>>>
>>>> This way I'll be able to compare the minimum violations rankings to the
>>>> expectation maximization algorithm which typically takes 4-8 iterations to
>>>> converge. But given the computational requirements, it would seem to make
>>>> sense to default to save every single ranking along the way in case you ever
>>>> want to look at it later.
>>>>
>>>>
>>>>
>>>> On Sun, Oct 7, 2012 at 10:33 AM, Eric Purdy <epurdy@uchicago.edu> wrote:
>>>>>
>>>>> > Let me also raise a substantive issue. Within groups there is more
>>>>> > agreement about what is status-worthy than between groups. While I
>>>>> > think
>>>>> > the idea that there exists a status ranking across all the students in a
>>>>> > school has enough truth to it to merit the work being done on it, this
>>>>> > approach is also obscuring the large differences in what earns you
>>>>> > status in
>>>>> > different contexts. A focus on the causes and consequences of status
>>>>> > differences (status as defined in this paper) inevitably ignores the
>>>>> > importance of groups. Of course, there are ways of trying to address
>>>>> > the
>>>>> > existence of groups that complement the work done here, and maybe its
>>>>> > not
>>>>> > even as hard as I'm making it out to be, but you always have to draw the
>>>>> > line somewhere, and at that point you start waving your hands. I guess
>>>>> > I'm
>>>>> > just saying we have to do our best to be honest with ourselves and our
>>>>> > audience about the many things that might be important but we don't know
>>>>> > for
>>>>> > sure because we can't fit them in our model.
>>>>>
>>>>> You can model people playing different status games at the same time.
>>>>> Like, say for simplicity that there are two games in high school,
>>>>> being smart and being cool. Then we could try to figure out which game
>>>>> each player is playing (or how much they're playing both) based on
>>>>> that. It would make the overfitting problems much worse, but it's not
>>>>> super-hard to formulate such a model.
>>>>>
>>>>> Also, you could probably get some sort of independent check of such
>>>>> models by giving people surveys that try to determine what strategy
>>>>> people think they are following. Like, "how cool do you think you are,
>>>>> relative to your friends", "how do you decide whether you want to be
>>>>> friends with a new acquaintance", etc.
>>>>>
>>>>> --
>>>>> -Eric
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> -Eric
>>
>>
>>
>> --
>> -Eric
>
>
>
> --
> -Eric
--
-Eric
_______________________________________________
Math mailing list
Math@moomers.org
http://mailman.moomers.org/mailman/listinfo/math