Author Archives: akshaykrish

Computational Learning Theory

Authors: Michael Kearns and Umesh Vazirani

Avrim Blum taught a course on Machine Learning Theory last semester but as I was busy taking some other courses I didn’t have the chance to attend. I however am quite interested in the computer scientist’s perspective of machine learning and decided to read Kearns’ and Vazirani’s short textbook as an introduction to the subject. I really had very little exposure to computational learning theory (despite working in statistical learning theory), but I found the textbook to be quite a reasonable introduction to the field. Kearns and Vazirani highlight some of the main definitions and concepts and go over some key ideas but omit many other important results in the field, either for lack of space or because the field is quite active.

At the onset, Kearns and Vazirani must first define what learning means, and for this (and throughout the field) they use Valiant’s Probably-Approximately-Correct (PAC) learning framework. This is the most fundamental model of learning in the learning theory community, because it captures the essential aspects of the learning problem while abstracting away everything else. Numerous variants and extensions have been proposed and heavily studied, capturing additional aspects of learning including active learning, learning in the presence of noise, and the bayesian persepective.

A brief summary of the book follows: The PAC-learning is developed in chapter one via a series of insufficient definitions and subsequent refinements. Along the way, we see examples of some learning problems, algorithms, and proofs. Chapter 2 explores the principle of Occam’s razor and show thats algorithms that output simple explanations of the data are PAC-learning algorithms. Chapter 3 unifies all learnability results with the beautiful Vapnik-Chervonenkis (VC) theory, which gives both upper and lower bounds on sample complexity for PAC-learning a concept class in terms of
a quantity called VC dimension, which captures the complexity of class. Chapter 4 discusses the classical and practically popular idea of boosting, whereby “weak” learning algorithms can be grouped together intelligently into a PAC-learning algorithm. Chapter 5 introduces the notion of noise, the statistical query (SQ) model, and shows the connection between learning in SQ and learning in the presence of classification noise. Chapter 6 and 7 focus on unlearnability, first establishing concept classes that cannot be learned in polynomial time (under complexity-theoretic assumptions) and then showing how reductions can be used to prove more unlearnability results. Finally, Chapter 8 shows how active learning, where the learner can construct instances and ask for their labels, can be used to overcome this inherent unlearnability.

One of the reasons I read Kearns and Vazirani is that (in my opinion) in statistics, it is hard to capture notions of computational efficiency but it seems to be essential in many estimation problems. The large body of work from the optimization community (many estimators are solutions to optimization problems), has developed a nice, although different from the CS-theorist’s, characterization of computational efficiency, but when studying limits of estimation (i.e. lower bounds), statisticians have no way to capture computational constraints. We typically study the optimality of estimators via information theory, which ignores computational considerations, but allows one to derive fundamental limits on estimation. In many problems this is sufficient, since we can give efficient algorithms whose performance matches the information-theoretic limit, but in others we are not sure if we can. A natural question in this latter setting is if computational considerations makes the problem harder, so that no efficient algorithm can match the information theoretic limit. The fact that computational learning theory works with a good model of computation and can develop such unlearnability results makes me think that bridging the gap between statistical and computational learning can help answer these questions.

So, from that perspective I think it was a very worthwhile read. The book has gotten me excited about a new perspective on learning and acquainted me with a new community of researchers and I think it is a promising direction for answering questions about computational and statistical tradeoffs in learning. I will say that the book is certainly not comprehensive, but it is a good, short introduction and I found it fairly easy to read. I would recommend to anyone interested in the field, and suggest Professor Blum’s course notes for further reading.

Edinburgh and Glasgow

For the past week and a half I have been in Scotland for ICML 2012 (which I hope to write about soon). Some friends and I arrived in Edinburgh a couple of days early so that we could sight-see and be tourists before things got hectic with the conference. Unfortunately, we didn’t get to see all that much of Scotland but we did thoroughly explore Edinburgh and also managed to fit in a day trip to Glasgow. Both cities are beautiful, and in exploring both of these cities, I had the wonderful opportunity to learn about incredibly rich political and cultural history of Scotland.

Edinburgh is the political heart of Scotland, with its majestic castle, the Palace of Holyrood House (where the Queen stays on her visits), and the Scottish Parliament, hosted in the joyfully quirky parliament building. To complement the royal sights, there is St. Giles Cathedral, the National Museum of Scotland, with its detailed exhibition on Scottish history, the Scottish National Gallery, two lush gardens (Princes Gardens and the Meadows), and so much more. I also climbed Arthur’s Seat (just a twenty minute hike up a hill by the city), for some spectacular views of the city and its surroundings.

In contrast, Glasgow is the cultural heart. It feels like a much more lived-in city than Edinburgh, which feels too touristy to be livable. Glasgow boasts another beautiful cathedral, and huge necropolis where many famous Scots are buried, and probably the most amazing university campus I have ever seen, among many museums and other things that we did not have time to explore. The university is a set of buildings that sits on a hill on the outskirts of the city, with a majestic gothic-style building looking over the city. It seems like an amazing place to go to school. Given that we were only in Glasgow for a few hours, we did not get to explore as much of the city as I would have liked, but I did enjoy the time we spent there and feel like I got a good taste of the city.

Scotland has a very interesting history, starting from the “indigenous” Picts and Scots, through the Roman struggles and the subsequent wars with England, to the religious struggles of the last millenia. Coupled with their rich and unique cultural, from the food to their Scotch whiskey, to their famous thinkers and artists (David Hume, Sir Walter Scott, J.K. Rowling, and Thomas Smith are either Scottish or spent significant amounts of their lives in Scotland), it is a wonderful place to visit and explore. I will caution that the weather is quite finicky, and even in the middle of summer it was raining on and off almost every day, although it was never more than a nuisance. I was disappointed that we didn’t get to visit the highlands, but I suppose it leaves something to look forward too if I ever find myself back in Scotland again.

Game of Thrones

Author: George R. R. Martin

After watching and thoroughly enjoying Season 1 of Game of Thrones on TV, and after hearing so many of my friends say the liked the books more than the series, I decided it would be worthwile to read the Song of Fire and Ice books. Actually, I have been meaning to do this for over a year now, but every time I went to the library I found that “Game of Thrones” was checked out, and only recently did I get ahold of the book. Once I did finally get ahold of it, I immediately read it, and sort of got hooked; I found myself reading it at work, spending large amounts of my free time reading, and losing track of time as I got lost in Martin’s intricately designed fantasy world. Given that all of this happened, it is pretty apparent that I really enjoyed the book.

Growing up, I loved reading fantasy novels; I was so impressed at how vividly and painstakingly authors like Tolkien and Rowling described these worlds that they had created. Martin has done exactly that with Game of Thrones; painting a picture of his world and developing a complete history that is revealed in bits and pieces over the course of the novel. Every detail is carefully thought out, every character plays his or her integral role in the story, and the consequences of every event play out in critical ways. Tolkien’s Fellowship of the Ring series is incredibly famous in part because of his attention to these details. Without having finished the series yet, I believe it is reasonable to compare Martin’s Song of Fire and Ice series to the Fellowhip of the Ring is this regard.

One of the things that I recently found to be disappointing about a lot of fantasy novels is that the plot-lines are very clear-cut battles between good and evil, characters are clearly aligned with one of the two sides, and conflicts and resolutions are often incredibly obvious. This is definitely not the case in Game of Thrones and it is one aspect I find refreshing that I think separates the series from other fantasy novels I have read. While there are some characters that are clearly good or clearly evil, it is hard to pinpoint the nature of many of the characters. Indeed many of the characters are truly looking out for themselves, shifting allegiances to whoever can aid their agenda the most. I really liked this political/intrigue aspect of the novel and as I mentioned, I think it distinguish the novel and probably the series.

The one issue that irked me while reading the novel is the fact that I had already seen the TV series and the latter follows the former almost exactly. While reading, I not only knew how events would pan out, eliminating all of the suspense and conflict, but I also had mental imagery of the scenes and of Martin’s world. I was disappointed that I lost the opportunity to be creative in envisioning the world and in retrospect, I wish I had read the novels before watching the series. Obviously this isn’t a criticism of the novel. Fortunately I haven’t started watching season two, and I plan on reading “A Clash of Kings” before watching that. I would caution against watching the series before reading the novels, if you’re into that sort of thing.

Having grown up with fantasy novels but not read one for several years now, I found Game of Thrones warmly nostalgic, and really enjoyed the novel. I am excited to get started on the second book, but, as usual, my reading list is growing faster than I can read, so I’m probably going to switch gears and read some non-fiction before “A Clash of Kings.”

A Random Walk Down Wall Street

By: Burton G. Malkiel

“A Random Walk” is a whirlwind tour of personal investing terminology and strategies, targeted toward the average new investor. Malkiel’s glaringly obvious thesis is that there is no sure-fire path to beating the market, but that a simple buy-and-hold strategy with diversification will follow the market to almost sure-fire capital gains. With appropriate asset allocation, taking into account one’s capacity and tolerance for bearing risk, Malkiel claims that this strategy will allow an individual to meet or exceed their investing goals. Along the way, we meet investment strategy after investment strategy (almost all of which fail after some point), and we also acquaint ourselves with several investment options, and technical jargon, all of which seems useful for a new investor.

Malkiel presents three main schools of thought: technical analysis, the “art” of finding trends in past stock history to predict future stock history and make investment decisions; fundamental analysis, where true value of a stock is assessed and used for investment decisions; and modern portfolio theory, where risk of a stock is evaluated and higher (systematic) risk generally yields higher expected returns, but also with higher variance. Modern portfolio theory seems to be the most reasonable and successful of the three schools of thought, but it too is faced with the challenge that estimating risk is challenging. The traditional measure of beta is not only hard to compute, but it also does not reflect all of the risk associated with any security. That being said, MPT seems to be the theory that holds the most water to this day.

Malkiel spends on chapter discussing behavioral finance and concluding that markets can actually have systematic biases from well-documented psychological factors such as loss aversion and overconfidence. I’ve been planning on reading “Thinking Fast and Slow,” which is devoted to behavioral finance/economics so more on this later.

Investment options discussed range in risk (and consequently expected return) from “higher risk” index funds and mutual funds to treasury bills and certificates of deposit. There are a wide variety of investment choices and Malkiel covers the pros and cons of many of them. One thing that I found somewhat poorly explained is that in many cases the tax advantages associated with one investment choice make it particularly appealing, but the specifics of these advantages are not covered. I, for one, was curious as to why these tax benefits exist in the first place and how they play such a prominent role in investment decisions. At any rate, tax benefits play a critical role and one should choose investment options with this in mind.

The last section of the book serves as an investment guide with Malkiel’s suggestions on asset allocations for various age-brackets (corresponding to capacity to bear risk) along with hints and tips for successful investing. Some of the big takeaways for me include:

  • Re-balancing one’s portfolio on a yearly basis not only keeps risk at the desired level but can also lead to higher returns as it naturally results in a sell-high, buy-low policy.
  • Dollar cost averaging, or slowly investing in monthly, quarterly, or bi-yearly increments, can reduce risk because it minimizes the timing affects of your investment. Not only that, it can lead to increased profits because your investment during downswings gives you more shares than if you had made the same investment at another time.
  • Diversification reduces unsystematic risk and the risk associated with a well-diversified portfolio will be almost entirely systematic, so you will reap the rewards associated with bearing that risk. Things like index funds and REITs are good vehicles for diversification.

Another theme echoed over and over is that any strategy that exposes some inefficiency in the market for profits will become widely adopted and consequently corrected. In particular, any fixed strategy that is successful is self-destructive in that the popularity of any strategy makes it less successful. Even the buy-and-hold strategy is self-destructive in the sense that the S&P 500 stocks have become artificially inflated by the popularity of index funds that track this group.

I still have a lot to learn about investing but I do think this book is a very good primer. Definitely worth a read, and also probably worth revisiting every couple of years to reacquaint yourself with terminology and investment strategies.

Mr. Yuk at Chicago Invite

Last weekend, Mr. Yuk headed west to Chicago for our last tournament before the series. We were on a six-game losing streak after a terrible Sunday at Queen City Tune-up and a possibly worse Saturday at Easterns Qualifier, but for the first time since the fall, we had a full roster, which suggested we might have a more positive weekend. Results-wise, we certainly did; we finished in 9th place (out of 64 open teams) going 6-1 on the weekend, with our only loss coming to Michigan State (the eventual finalist) on universe point.

Unfortunately, the tournament format wasn’t all that ideal. We were pretty horribly under-seeded after poor performances against good competition earlier in the spring and this meant that to get into the championship bracket (and finish higher than 9th), we pretty much had to be undefeated on Saturday. Moreover, our cross-over game on Saturday evening would be against a two-seed from a power pool, in this case Michigan State. If we were seeded slightly better or slightly worse, our cross-over would have been against a three or four seed, which would have been a much more winnable game. Our loss to MSU meant we could get 9th at best, and by winning out on Sunday, we did exactly that.

First the break-down, then some comments.

  1. Michigan B: As is usually the case with Mr. Yuk, we started out pretty slow trading points early on and going down a break before we really got our act together. They ran a poach-y defense that our O-line had a pretty hard time dealing with, resulting in a lot of miscues and easy blocks for them. Our D-line really picked up the intensity, with much more tenacious man-defense that I’m used to seeing from our team, and also with some nice zone-looks that generated turnovers. We started to run away with the game in the second half and eventually won 13-8.
  2. Wright State: With some momentum from our win against Michigan B, we brought the pain on Wright State, generate break after break. The O-line didn’t need to play all that much, but when they did, were cool and consistent. The D-line worked hard to generate turnovers (actually earning blocks, not just capitalizing on miscues) and then converted time after time. CMU wins 13-3.
  3. Dayton: Building on our earlier rounds, we came out strong right away, generating three breaks to start the game and taking half 7-2. Dayton was plagued with injuries and consequently played pretty short-handed but our defense was pretty stifling. Again O-line did their job. CMU wins 13-6.
  4. Michigan State: As our cross-over game, this was a must-win situation for either team to get into the championship bracket on Sunday. At Easterns Qualifier we played MSU very close, losing 14-16, and I looked forward to a competitive, exciting game. As it turns out that is exactly what happened. We started out trading points, and eventually MSU earned two breaks in the first half. The second half was a different story. We earned three breaks to take the lead 12-11, generated another block with the opportunity to win but a costly turnover swung momentum in favor of MSU. They scored to tie the game at 12s, and we traded to 13s. On universe point, both teams were very impatient, resulting in a marathon of a point with the possession exchanging several times. A costly end-zone turnover on our part enable MSU to work their end-zone offense and score for the universe-point win. MSU wins 14-13.
  5. Dayton: Relegated to the 9th place bracket, we faced Dayton again first thing on Sunday morning. If it’s at all possible, they had incurred even more injuries but as usual, we came out incredibly slow, going down 1-4 right away. Once we got our act together both lines started clicking and we earned our breaks back to win 13-10.
  6. Western Michigan: Another fairly straightforward win for us. We started out trading and then broke a couple of times to take half 7-4. Deep defense was huge for us this game as they looked to huck early and often but we prevented them from completing the majority of these attempts. The second half was went smoothly and we came out ahead. CMU wins 13-8.
  7. Northern Iowa: Penn State was on the other side of the bracket after a surprising loss in their crossover game on Saturday, and we were expecting to see them in the 9th place game. However, they got upset by Northern Iowa, who we ended up facing. Again we came out strong right away, taking a quick 3-1 lead. They broke back once in the first half but we got a couple of more to take half 7-4. In what was a fairly uneventful second half, we ended up winning 12-9 after the game got capped. Thus we ended up in 9th place.

Some comments:

  • We clearly play up or down to our opponents. We played some teams that we should have crushed this weekend, yet many of our games ended pretty close. This meant that we had to keep our starters playing, so that in the games that mattered we were fairly fatigued. We need to get much better at putting away bad teams. On the plus side, we really stepped up to MSU and showed that we can compete with some very good, potentially nationals caliber, teams.
  • Another mental thing, it takes us way too long to get going in the mornings. Come regionals, we can’t dig ourselves into a hole on Saturday or Sunday morning, and really all it means is that we focus and take our warmup seriously. This is purely mental, but, to me, it is critical for our success in the series.
  • The last mental thing, we lost our composure towards the end of the MSU game and I think that cost us the game. We had a couple of unforced turnovers and bad decisions, especially on universe point, where we repeatedly gave them back the disc. Our offense was not clicking on that point and I think that in part it was because we’re not used to such high-pressure situations and lost our cool. We’ll only improve on that with experience but it is something to note.
  • Strategically, our O-line really struggled with the poach-y defense that Michigan B threw at us. Our cutters were very hesitant about moving through the poaches, making it very hard for handlers to find open upfield targets, and resulting in several turnovers. I’m not sure if a lot of teams in our region rely on poach-y defenses but it is something we should work on in the next couple of weeks.
  • Defensively, the one thing that really hurt us in the MSU game was our marks. We started out marking honestly, but MSU’s handlers repeatedly got off around breaks, making it near impossible for upfield defenders to get blocks. We adjusted by having our marks really take away the around option, but this came at the expense of opening up the inside looks, which MSU exploited, again at the cost of our upfield defense. Better marks would ideally challenge both options simultaneously, making it much easier for us to generate turnovers.
  • On a positive note, both our offense and defense clicked much better than it has in past tournaments. On the defensive side, missing two of our starters at past tournaments definitely contributed to our relative success this weekend. Offensively things also seemed to work very well for the most part (apart from a couple of miscues and the poach-defense thing). Defensively I was especially happy to see more and more people laying out and actually contesting both underneath and deep throws.

All in all, we had a solid weekend, winning games we should win, and with a really close, competitive game against a very good team in MSU. However, there are definitely things to work on in the next couple of weeks leading up to regionals. With our region only earning two bids to nationals, we’ll have to iron out all of these kinks if we want to make a push for one of them.

Conferences is next weekend so look for another post in a week or so.

A Very Simple Explanation of Spectral Clustering

I’ve been wanting to write a post about some of the research I’ve been working on but I realized that my work has gotten really technical and not very accessible to someone not in machine learning. So I thought as a precursor I would write about spectral clustering at a very high level and with lots of pictures. I hope you can follow along. For a more technical introduction to spectral clustering, this tutorial is very nice and much more complete than this article.

Clustering

To start out, it is important to understand what the clustering problem is and why it is important. Clustering is an extremely useful tool in data analysis whereby a set of objects are organized into groups based on their similarity. We typically want to form groups so that objects in the same group are highly similar and objects in different groups are less similar. This is typically known as unsupervised learning because the data set is not labeled (differentiating it from other tasks like regression and classification).

Many applications of clustering are more exploratory, in the sense that further analysis is done individually on the clusters discovered by the algorithm. Applications include image processing, bioinformatics, information retrieval and many, many others. In image processing, one application might be to identify all of the different objects in the image by clustering the pixels. Once pixels have been clustered, it becomes much easier to tag or label the objects. There are several applications in bioinformatics, but one example is identifying groups of genes that have similar expression patterns across conditions. Identifying groups of similar genes facilitates subsequent analysis, in part because these groups are much smaller than the original set. Another application is in networking, where grouping computers on a network based on packet travel times can be used as a preprocessing step to several network optimizations. I won’t delve too much into applications but I hope that you are convinced that this is an interesting and very important problem in data analysis.

Mathematically, the clustering problem can be thought of as follows: we are given a dataset of points (objects) $$x_1, ldots, x_n$$, and a number $$k$$, and would like to partition the dataset into $$k$$ groups (clusters) so that within cluster similarity is high and between cluster similarity is low. In some cases the points lie in some Euclidean (or metric) space so we can use distances as replacement for similarity, but we want within cluster distance to be low and between cluster distance to be high. Getting from a specific application to this mathematical formulation is sometimes easy but sometimes less so. In the gene expression example, we can represent each gene as a vector of its expression in each of the conditions. Thus it is natural to think of the genes as living in some Euclidean space. In the networking example, it is more natural to consider a computer only in terms of its distances to other computers. Here there is no Euclidean structure but we are still able to use a number of clustering examples.

As an example of the more abstract scenario, consider the dataset below, consisting of some points in the plane drawn from two two-dimensional gaussian distributions. One can visually see that the clustering in the second figure is a good one, and this matches our intuition that the points from one normal should be in one cluster while the points from the other normal should be in a different cluster.

I won’t say much more about clustering in general, except that typically this is framed as a minimization problem for some cost function. For some examples and some more discussion about this, please see this wikipedia article.

Spectral Clustering
Spectral Clustering is a family of clustering algorithms that have foundations in graph theory and linear algebra. I won’t dive too much into these foundations in this article, but rather I hope to give some intuition as to why this kind of algorithm makes any sense.

First, what is the algorithm. I’ll explain one simple variant here. Given our data points $$x_1, ldots, x_n$$, we construct a graph on the $$n$$ objects where two objects are connected by an edge if they are sufficiently similar. Going with our example from above (different set of data, but two clusters from two-dimensional normal distributions), if we were to add an edge between every set of objects $$x_i, x_j$$ whose distance is less than 1.5, we obtain the following graph:

With every graph of this form, we can construct an $$n times n$$ matrix $$M$$, called the adjacency matrix, where $$M_{ij} = 1$$ if there is an edge between $$x_i$$ and $$x_j$$ and $$M_{ij}= 0$$ otherwise (for simplicity set $$M_{ii} = 1$$ for all $$i$$). In spectral clustering, we look at the eigenvalues and eigenvectors of this matrix (or a related matrix called the Laplacian) and use these quantities for clustering.

Why would the eigenvectors and eigenvalues of $$M$$ tell us anything meaningful about the clustering of the data? Consider a very simple example, where we have two clusters and when we construct the graph, we put edges between every pair of objects in the same cluster, and put no edges across clusters. If this is the case then the adjacency matrix $$M$$ of the graph is block diagonal and looks like (suppose all of the objects in the first cluster come before the objects in the second cluster and suppose we only have 4 objects in this toy example):
$$! left(begin{aligned} &1 1 0 0 \ &1 1 0 0 \ &0 0 1 1 \ &0 0 1 1end{aligned}right)$$

And it is easy to see that the eigenvectors of this matrix are $$(1 1 0 0)^T$$ and $$(0 0 1 1)^T$$. If we took the first one of these, the coordinates for which the eigenvector is $$1$$ correspond exactly to the items in the first cluster (the same is true of the second one identifying the second cluster). If I were to permute the matrix by swapping rows and columns, to something like:
$$! left(begin{aligned} &1 0 1 0 \ &0 1 0 1 \ &1 0 1 0 \ &0 1 0 1end{aligned}right)$$

Then the eigenvectors are $$(1 0 1 0)^T$$ and $$(0 1 0 1)^T$$, which again exactly identify the clustering (here the first and third objects are in one cluster and the second and fourth are in the other). This is the simple justification of why Spectral Clustering makes any sense; the eigenvectors of these matrices make it really easy to identify the clusters.

I just argued that spectral clustering would work for a very contrived example and I hope that you are feeling dissatisfied by this. Some questions you might have are: What happens if not every pair of objects from the same cluster shares an edge (as in the figure above)? What happens if there are edges between the two clusters? What happens if we are looking for more than two clusters?

It turns out that it isn’t a big deal if the clusters do not form completely connected subgraphs. In fact, it is sufficient that each cluster is connected (there is a path between every pair of nodes in the cluster, not necessarily an edge), and disconnected from the other clusters for the eigenvectors to exactly identify the clusters. However, to make this claim, we have to use the Laplacian matrix rather than the adjacency matrix. A proof of this claim is in the tutorial linked to at the top of this article and I don’t really want to go into the details but please see that if you are interested. With this result, you should be convinced that spectral clustering on the graph in the third figure will recover the clustering in the second figure (which as we said is the correct clustering).

If there are between cluster edges, then in general it’s not obvious that spectral clustering (or any clustering algorithm for that matter) will work. However, as long as there are very few between cluster edges (in comparison to the number of within cluster edges) then there is reasonable high-level justification for the algorithm’s performance. Each spectral clustering algorithm has connections to a graph cut problem. All of these graph cut problems look for a partitioning of the graph that minimizes the number of edges that have to be removed in the partitioning (usually also with some restrictions on the sizes of the partition subsets). It is well known that spectral algorithms all go after these graph cut objectives (specifically they are relaxations of those problems, which are hard to solve). If you believe this argument then you should at least be convinced that if the smallest cut is easy to identify (i.e. the number of between cluster edges is very small) then spectral clustering should be successful.

Extending spectral algorithms to handle more than two clusters is actually somewhat challenging. The most popular strategy is to use the first $$k$$ eigenvectors (corresponding to the largest eigenvalues), represent each object as a $$k$$-dimensional vector using the eigenvectors, and run a different clustering algorithm in the $$k$$-dimensional space. If the eigenvectors are $$v_1, ldots v_k$$ each in $$mathbb{R}^n$$ then we would represent the object $$x_i$$ as $$(v_1(i), ldots, v_k(i))^T$$. The high level justification (again we can appeal to the contrived case) is that the eigenvectors separate the cluster very nicely, and representing each object in terms of the eigenvectors makes it very easy for other clustering algorithms to succeed. As an example, consider the block diagonal case again. In this case, all objects from the same cluster get mapped to the exact same point (and objects from different clusters get mapped to different points) so that virtually any clustering algorithm will work.

Conclusion
If you made it this far, I hope that you have some basic understanding of what the clustering problem is and why looking at the eigenvectors of a graph adjacency matrix is a reasonable thing to do for the clustering problem. I only gave a really simple example to convince you that this algorithm makes sense, and really I didn’t even tell you what the algorithm is in detail, but maybe I gave you some insight as to why these things might make sense. Without getting into lots of linear algebra, it’s quite hard to explain what is really going on with spectral clustering, but if you are prepared for that challenge I highly encourage you to read the tutorial I mentioned earlier.

My primary motivation for writing this is so that I can explain some of the things that I have been doing for the last year or so. The main idea of our work is to prove that spectral clustering will perform well on a large family of “clusterable” graphs. These graphs have identifiable clusters, but are corrupted with a lot of noise, which makes it challenging to recover the clusters. Our results show that the eigenvectors are stable under noise, so that spectral clustering still performs well while other algorithms may not. More on this in a later post.

Mr. Yuk at Easterns Qualifier

On Friday, Mr. Yuk made the 11 hour drive from Pittsburgh back to Wilmington, NC for Easterns Qualifier, hosted by UNCW. After a fairly disappointing showing at Queen City, we were looking to turn things around and earn some big wins this weekend. Unfortunately, due to injuries, midterms, and other commitments, we only brought 15 healthy players and were missing some big players on both the O and D lines. In a pretty competitive tournament, we couldn’t afford to rest our top guys much, and fatigue reared its ugly head late on saturday in a very winnable game against Tennessee.

The big story of the weekend was the weather, which was warm but raining on and off on saturday, and throughout saturday night, leading to sunday games being cancelled due to field closures. After falling into the 9th place bracket on Saturday, we were looking forward to playing some more winnable games, but alas it was not to be. Instead we got an early start home and made it back to Pittsburgh in time to catch the live stream of the finals of Stanford Invite.

Before I go into the takeaways from the tournament, here’s a game-by-game recap:

  1. Michigan State: In what was undoubtedly our best showing of the weekend, Mr. Yuk took MSU to 14s, only to lose 16-14. MSU came out strong, repeatedly breaking our marks and working down the break side of the field. MSU earned a couple of breaks off of drops and miscues from our offense but by fixing our marks, we were able to earn them back and keep the game interesting. We earned two breaks late in the game to tie the score at 14s, but MSU finally converted and then broke to win. They were a physical team with a lot of skilled players, but our performance showed that we can certainly keep up with this caliber of team.
  2. Penn State: Coming off a tough loss to MSU, we didn’t bring any intensity to our second round game against regional rivals Penn State. They quickly pulled ahead and never really looked back. Not much positive to say about this game; Penn State wins 15-5.
  3. Tennessee: Again, not much positive to say about this game. With a small roster, our top players started to fatigue and this lead to mistakes and a lack of effort on defense. We simply did not play very well and Tennessee outclassed us. Tennessee wins 15-7.

This was a poor showing for Mr. Yuk. We demonstrated that we lack the mental toughness to maintain intensity after a tough loss. We also do not have the stamina to keep our top guys playing every point of every game. This is understandable, but with a small roster this weekend, we did not have enough to play close games against these teams. With a fairly top-heavy roster this year, we specifically need to improve on this aspect, so that we can be competitive in the series.

Strategically the big let-down for us was our marks. Both MSU and and Penn State repeatedly broke our marks at will, resulting in several easy goals. To me this is also a mental thing, that can be fixed fairly easily, but with poor marks we will not be successful in the remainder of the season.

It is now patently clear that Mr. Yuk had a terrible winter. The poor attendance at practices and the lack of individual effort outside of practice clearly showed in our poor performance last weekend. I’m hoping that this is a wake up call for us; we still have time to become the team we expected to be this season, but we have a lot of work to do.

Grad School Stuff VI: Choosing an Advisor

CMU finished making acceptance decisions for the computer science department and I opted to be a student contact for a some of the admits. One of them asked me about choosing an advisor and specifically what factors he should consider in making this decision. This is a really important decision for incoming and first year graduate students that I felt it warranted adding to my previous series on graduate school.

Disclaimer: As with everything I write, this is almost entirely my opinion on this matter. It is a very important decision so I highly recommend listening to what other people have to say and talking to a lot of people about this. My experience is also very limited so any opinions should be viewed in that light.

First, choosing an advisor is a very subjective decision. It depends largely on what you are looking for and what works well for you. For example, if you are very self-motivated and independent, it may be better to have an advisor that is less hands on and vice-versa. That being said there are some factors that are more important than others, and some factors that are all-together not very important. It is often hard to evaluate an advisor along all of these axes without spending a significant amount of time working with him/her and presumably you don’t want to spend two years figuring out if an advisor is good for you. Fortunately, in many cases there are more readily observable factors that are correlated with these, and keeping in mind that there are always exceptions to the rule, the observable factors can be used to make a much faster evaluation.

Some Factors

  1. Research Interest – This is the most important thing. Your advisor will most likely be funding you, so they will want you to do work that can fit under their grants. If you want to do something else because you don’t find what they want you to do to be interesting, most of them time you will have to work on that in your free time. Find an advisor who will let you work on what you want to work on. This doesn’t mean that they necessarily give you more freedom, it just means that they are interested in the things that you want to work on. An advisor’s interests is strongly correlated (kind of obviously) with the articles they have published. However, interests change, so recent papers are better indicators. Also, sometimes an advisor will fund you with a specific grant, and that grant may reflect a new direction for the advisor, so really the best way to find out what an advisor wants you to work on is to simply talk to them about what you want to work on and what they want you to work on.
  2. Compatibility — It is really important that you can have productive meetings with your advisor. If you do not have a good interaction with them, your meetings will not be very productive and your research will suffer. This does not necessarily mean that they should be your friends; it means that your meetings should be productive. Being friends with your advisor is completely orthogonal; it can be good but I don’t think it is necessary. It’s hard to assess compatibility without actually interacting with the advisor so no obvious correlations here. I recommend having several meetings with a potential advisor before committing to being their student. It is also worthwhile to talk to their students and former students to not only see what they work on but also get a feel for how the professor tends to operate. More on this later.
  3. Advising Style — Advisors have different ways of managing their students. Some like to be very hands on, meaning that they will work directly on your research problems with you and help you write papers. Others are much less so; they may have you work with more senior students or post-docs and they may be much more removed from your problems. Some push their students to publish early and often, others are less opinionated about publishing or even don’t let students publish as much as they want to. There are many different dimensions to advising style but there is no clear optimum. Each style has different pros and cons and it is important that you decide which aspects you care about more. For example, if you are not very self-motivated, a hands-on advisor could be good for you because they will make sure you continue to make progress on your research. On the other hand, if you are self-motivated, a hands-on advisor could be annoying, because you have to prepare things to talk to them about every week or so. If an advisor is less involved in your career, you may get sidetracked or lost, and have minimal research output. On the other hand, it gives you freedom to explore and really find problems that interest you. Anyway, I hope that you see my point; there are lots of options here but you need to think about what is right for you. Some correlated factors: age, tenure status, number of students. I’ll write a bit about these below because they can be very informative for many of the subsequent unobservable factors.
  4. Availability — Most professors are pretty busy, but some are much more so than others. Some travel on a very regular basis and others generally stay on or near campus. This could affect how much time you have to meet with them. Sometimes this is a problem and sometimes this is not; it really depends on your working style. This is often correlated with age, tenure status, number of students and fame.
  5. Fame — Some advisors are certainly more famous than others. This has its pros and cons. On one hand, their opinion is held in high regard so their approval of you carries a lot of weight in the community. They also typically have a lot of contacts that can be very valuable as you look for internships and jobs later on in your graduate career. On the other hand, along with fame comes increased travel and decreased availability which has direct implications for you. So while there are perks to having a famous advisor, ultimately your will work have to speak for itself. If because they cannot spend as much time with you, you don’t end up doing amazing work, the perks that they bring will be all but useless. Consequently, I think it is really more important that you find an advisor that can help you do great research rather than one that can give you these auxiliary perks. Moreover, you will also have many opportunities to find internships and network for yourself when you go to conferences, so you don’t necessarily need your advisor to help you establish connections.
  6. Funding — Different schools/programs approach funding differently, so this may or may not be an issue. Typically, an advisor won’t take you on as a student unless they have funding for you. Before you get excited about working with someone, make sure that they can actually support you. In my program, you should be explicit about it. Actually, ask your potential advisor, “Can you support me financially?”

The Observable Factors:
As I mentioned above, there are a lot of factors that correlate well with the ones that are a little harder to discern. Of course you have to be careful in using these as surrogates to the ones above, but I don’t think it unreasonable to make these connections.

  • Age and Tenure status — Typically age and tenure status are good indicators of advising style, availability and fame. The professors incentives can completely explain this correlation. For a younger professor (also likely a untenured professor), your research output has direct implications for them getting tenure, become famous, etc. Consequently, they are much more driven and they also usually are much more hands on. Also with being young, they may not be very well established, so they can be much more available to meet with you. It is uncommon for them to be very famous, simply because they are so young. On the other hand older professors that have tenure tend to be more hands off and less pushy about publication. However, they are usually more established so they may be less available but more famous. With age also comes experience, older advisors generally have a good feel for the direction of their research area and can also have a much broader vision about their research agenda. This means they know what the big interesting problems in their field are and can help you tackle them. It can result in you doing a very cohesive body of work with large impact.
  • Number of students — An advisor with a lot of students naturally has less time to devote to any single one. So there are obvious correlations to availability. There are more subtle correlations to advising style. An advisor may choose to have fewer students so that he/she can really dive into the problems those students are working on. One with more students, on the other hand, may choose to be much more removed from their students research.

What about co-advising?
In some departments/programs, co-advising is an option that should certainly be considered. Being co-advised by one young and one old professor seems like the best of both worlds in terms of the things they have to offer. You can get the guidance/availability of a younger advisor with the broader vision and connectedness of an older advisor. I don’t have any experience with co-advisors, but I’ve been told that there are some things you need to keep in mind before jumping into this sort of situation. First, you need to make sure that your advisors get along. You may (probably will) have meetings with both of your advisors together, and you need to make sure that these meetings are productive. Second, make sure that your advisors have common interests that align with what you want to work on, or you may end up working on distinct problems for each advisor, which seems suboptimal. In this sense, negotiating a co-advising relationship can actually be quite challenging, but it can also be very rewarding if it works out.

Some parting thoughts
I’ll emphasize this again. This article is mostly my thoughts on the matter and I have just one perspective. Read what other people have to say about choosing an advisor and talk to people in your department and program. You can tell them precisely what you want and they can tailor their comments to your specific situation and even the professors you are considering. Talk to the students/former students of the advisors you are thinking about and ask them a lot of questions. Ask about their advising style, their openness to co-advising, what their graduated students have gone on to do, whether they are open to students doing internships over the summer. Any question that you can think of is probably worth asking the advisor, their students or anyone else in the department. Most students (at least in my program) are more than happy to talk to you about your decision and offer whatever advice they can so don’t feel shy. One specific question I highly recommend asking is if the advisor has had any students switch away from them and why. It is important that you don’t waste a year or two working with someone only to switch advisors later and have to (more or less) start over. You should avoid this if at all possible and understanding why people leave an advisor can definitely help you do that.

Presumably if you’ve made it this far you are actually in the middle of this process or about to begin this process. In that case, I hope you find this useful and good luck in finding an advisor and your graduate career!

Mr. Yuk at Queen City Tune Up

This weekend, Carnegie Mellon’s Men’s ultimate team, Mr. Yuk, made the 8 hour drive out of snowy Pittsburgh to Charlotte, NC for Queen City Tune-Up. This was our first tournament since early November (Fall Easterns) and we have had very few opportunities to actually play ultimate since returning to school in January. While we haven’t had the best winter in terms of training, we did have a very productive fall season, so while my expectation was that we wouldn’t have the of best tournament, it was not unreasonable to think that we could do very well. As it turned out, my expectation was correct both results-wise and performance-wise.

Results-wise we went 3-4, with big wins against UNC, Cornell and NYU, but with upsetting losses to Ohio State and NCSU along with losses to Dartmouth and Michigan. This resulted in us taking 12th place out of 20, behind our original seeding of 10th.

Before I go on, let me just mention that the weather was a huge factor, particularly on Saturday. Reportedly winds were recorded at 20mph with gusts up to 40mph and all of our games were upwind-downwind. This had huge strategic implications for both teams, which ended up punting and playing defense on the downwind side, relying on the strong winds to help generate turns with favorable field position. Apart from Dartmouth, no team could move the disc up the field with any consistency, so upwind scores were hard to come by. Most games were consequently decided by one or two upwind scores.

Sunday was much less windy, but much colder than Saturday. The fields were frozen when we arrived and, at the start of our first game, the temperature was just about 32F. This more or less persisted throughout the day. But without the strong winds, Sunday was much more playable.

Performance-wise, we learned that we have many things to work on. Offensively, we struggled to move upwind against aggressive zone defenses. As I mentioned, part of this was because the winds were particularly strong, and most teams had difficulty doing this, but I think we can certainly improve in this area. Two specific things I think we can work on are keeping the disc off of the trap sideline and keeping the cup moving with fakes and crashes. Defensively, I think our zones need a lot of work; and I’m hopefully that this is just a product of not having practiced together in a couple of months. Our Defense-O and Offensive-D both need a fair amount of work; defensively we had many opportunities for breaks that we didn’t capitalize on and offensively we had trouble recovering the disc after a turn over.

While there are certainly many things to work on in preparation for the series, there were also some things that we did quite well. Firstly, our O-line didn’t get broken in 3 of our games on Saturday. This lead to our three wins of the tournament, all by a difference of just a couple of goals. If the offense can maintain this performance, we will be in a great position to win games in the future. Defensively, we managed to generate a lot of turnovers which is great, but as I mentioned, we had a hard time capitalizing on this. I was also pleasantly surprised with how well our D-line worked through zones, especially considering that we have a lot of people that are uncomfortable with their throws.

Here is a quick game-by-game recap:

  1. UNC — UNC was the first seed in our pool, and since we had a first round bye, we got in a good warmup and were ready to play come game time. The wind had already picked up a decent amount and we traded downwind O-points for most of the game. Offensively the strategy for both teams was to immediately huck to a receiver streaking deep, and play defense in the event of a turnover. This turned out to be a successful strategy as both teams struggled to work upwind. After a fairly monotonous game, we finally managed to earn ourselves an upwind break, and the subsequent downwind break sealed the victory. CMU wins 11-8 on hard cap, our O-line did not give up a break.
  2. Cornell — Coming off a huge with against UNC we were ready to bring it to Cornell. The winds picked up even more and this game proceeded much like the last one. We squeezed out a break just before half, and Cornell’s D-line could not score upwind, which resulted a positive result again for us. CMU wins 9-7 on hard cap and again our O-line did not give up a break.
  3. Dartmouth — While the winds were still raging, Dartmouth did not play the hucking strategy that we had grown accustomed to in the last two games. They had solid throwers that were comfortable both up- and down-wind and they capitalized on our O-lines turnovers to earn breaks. We struggled to move the disc upwind while they continued to be successful in both directions and this lead to a fairly easy win for them. Dartmouth wins 13-4.
  4. NYU — The last round of the day presented itself with a must-win situation for us if we wanted to make it to pre-quarters. Unfortunately we started out on the losing side of the coin flip, with NYU starting on offense going downwind. The game progressed as the previous ones had with both teams trading down-wind points and it looked like NYU would win in a hardcap situation, simply because they received first. However, just before half (pretty late in the game), our D-line somehow managed to get through NYU’s zone and earn a break. NYU looked about to do the same thing later but some game-saving plays helped us retain the lead and eventually earn the win. CMU wins 10-8 again on hard cap. O-line did not get broken.
  5. Pre-quarters: Michigan — Michigan is a historically very strong team and their experience certainly showed in this pre-quarters matchup. While the wind wasn’t as strong as Saturday, we did not adapt our strategy and turned the disc over too easily. Meanwhile, Michigan was very conservative, repeatedly breaking our marks and working this disc up the field to earn easy scores. Michigan wins fairly easily 15-6.
  6. 9th place bracket: North Carolina State — We played NCSU at Fall Easterns and beat them pretty handedly, but they were a much more prepared team this time around. Our O-line got broken several times early on and the D-line never managed to earn all of those breaks back. We went on a run later in the game but never managed to take the lead. In a disappointing loss, NCSU wins 10-8.
  7. 11th place game: Ohio State — We also played and beat OSU at Fall Easterns in a fun, pretty close game but this time Ohio State’s preparation really showed. Having played two other tournaments already this spring, they were clearly more conditioned, experienced and familiar with each other than we were, and they beat us handedly. OSU wins 15-8, we end up in 12th place, 2 places behind our initial seed of 10th.

We are going back to North Carolina (this time Wilmington) in a couple of weeks for Eastern’s Qualifier. With a lot of the same teams, we certainly have the potential to make some noise, but we cannot perform as we did this passed weekend in Charlotte. Stay tuned for what I hope will be some more exciting news.

Updates to the Academic Publishing Debate

The fight against academic publishers is heating up as Tyler Neylon’s website continues to gain support against Elsevier. If you haven’t heard, the website is a place where you can publicly declare that you will boycott Elsevier, one of the academic publishers with particularly terrible practices. It may have been created in response to Timothy Gowers’ public boycott declaration, and it is supported by him and many other famous scientists and mathematicians. As of today there are 3867 total signatories, and 544 signatories in computer science alone.

First off, why is Elsevier (and most other academic publishers) so evil? In a nutshell, they exploit the work of academics (funded by taxpayers) to turn incredible profits without adding much value. Journals are run by volunteer editors (academics), papers are reviewed by volunteer reviewers (academics) and papers are written by academic researchers. Moreover, researchers are expected to prepare “camera-ready” versions of their papers, which makes the paper almost entirely ready for publication. Publishers charge exorbitant subscription fees for their journals, but their costs are minimal and their value-add is effectively non-existent.

But publishing companies effectively have a monopoly on the top journals that academics need to publish in to advance their careers. Alternative publishing venues haven’t caught on because publishing there doesn’t carry the same weight as publishing in elite journals like Nature and Science. The fact is that academia, as it pushes the boundaries of knowledge, is very conservative about accepting change. Thus, despite the fact that several alternatives have been proposed (i.e. this, this and maybe even the Arxiv along with several more abstract proposals), academia has been slow to adopt alternative venues/media for publication.

Movements like “The Cost of Knowledge” are designed to combat the inherent inertia in academia, in hopes that we can converge on a better method of publication. Once academics realize that the many of their colleagues are boycotting Elsevier/Springer/etc, it will become much more reasonable for them to boycott as well. And once the majority of a field boycotts one of these companies, either alternative publishing venues will gain credibility, or the company will be forced to change its policies/pricing/etc or risk going under.

To me, the only issue is that this movement has to involve academic institutions as a whole in addition to individual researchers. Institutions use impact factor of journals as a surrogate for research quality and use this metric in hiring and tenure decisions. Until this changes, young, untenured researchers are going to be reluctant to boycott publishing companies that run elite journals because of the career implications that boycott has. This is probably one of the primary reasons why I haven’t joined the boycott yet.

The public boycott does have some interesting side-effects. First, the fact that the boycott is public and supported by top researchers means that it is more likely to gain traction. The fact that there is a list of elite researchers who are boycotting may influence how institutions make hiring decisions, which could kick start a positive feedback loop resulting in a much more powerful boycott. A more indirect effect is that top researchers are now boycotting elite journals, meaning that the quality of those journals will decline. This might force institutions to rethink how they make hiring decisions while also enabling alternative publishing media to flourish.

Whether the boycott is successful or not, enough people are up in arms (in the blogosphere, etc.) about publishing that it finally seems that academics have enough traction to prompt some sort of change in the academic publishing system. Hopefully we’ll see some positive changes in the next couple of years.