## Monday, August 12, 2013

### cs boot camp

Every once in a while I get a request from a colleague or friend asking me to help them sharpen their comp sci skills. Lately I've been receiving these with a bit more frequency, so I've started collecting exercises that I send out into a repository.

In case anyone has interest in it, they can be found here. There is one repository containing problem descriptions, and one containing support code, all written in Go. Each exercise consists of a brief discussion of some topic, as well as coding exercises and follow-up questions. They are kept intentionally brief as the audience should already have programming skills.

## Monday, July 29, 2013

### network splitting

Last week, Paul Rubin wrote an excellent post on Extracting a Connected Graph from an existing graph. Lately I've been performing related functions on data from OpenStreetMap, though without access to a solver. In my case I'm taking in arbitrary network data and splitting it into disconnected subnetworks. I thought it might be a good case study to show an algorithmic way doing this and some of the performance issues I ran into.

A small example can be seen below (click on the image to enlarge it). This shows a road network around the Las Vegas strip. There is one main (weakly) connected network in black. The roads highlighted in red are disconnected from the main network. We want code that will split these into connected subnetworks.

Say we have data that looks like the following. Instead of nodes, the numbers in quotes represent edges. Think of these as streets.

{
"0": [1, 2, 3],
"1": [9248, 9249, 9250],
"2": [589, 9665, 9667],
"3": [0, 5, 6],
"4": [0, 5, 6],
"5": [588],
"6": [4, 8, 9],
...
}

Our basic strategy is the following:

2. For each connection, merge the networks of the source and destination edges.

#!/usr/bin/env python
import json
import sys
import time

class hset(set):
'''A hashable set. Note that it only hashes by the pointer, and not by the elements.'''
def __hash__(self):
return hash(id(self))

def __cmp__(self, other):
return cmp(id(self), id(other))

if __name__ == '__main__':
try:
inputfile = sys.argv[1]
except:
print 'usage: %s network.json' % sys.argv[0]
sys.exit()

print time.asctime(), 'parsing json input'

edge_to_net = {} # Edge ID -> set([edges that are in the same network])
nets = set()     # Set of known networks

print time.asctime(), 'detecting disconnected subgraphs'
for i, (from_edge, to_set) in enumerate(connections.iteritems()):
from_edge = int(from_edge)

try:
from_net = edge_to_net[from_edge]
except KeyError:
from_net = edge_to_net[from_edge] = hset([from_edge])

if not (i+1) % (25 * 1000):
print time.asctime(), '%d edges processed / %d current subnets' % (i+1, len(nets))

for to in to_set:
try:
to_net = edge_to_net[to]

# If we get here, merge the to_net into the from_net.
if to_net is not from_net:
to_net.update(from_net)
for e in from_net:
edge_to_net[e] = to_net
nets.remove(from_net)
from_net = to_net

except KeyError:
edge_to_net[to] = from_net

print time.asctime(), len(nets), 'subnets found'


We run this against the network pictured above and it works reasonably quickly, finishing in about 7 seconds:

Mon Jul 29 12:22:38 2013 parsing json input
Mon Jul 29 12:22:38 2013 detecting disconnected subgraphs
Mon Jul 29 12:22:38 2013 25000 edges processed / 1970 current subnets
Mon Jul 29 12:22:44 2013 50000 edges processed / 124 current subnets
Mon Jul 29 12:22:45 2013 60 subnets found


However, when run against a road network for an entire city, the process continues for several hours. What is the issue?

The inefficiency occurs from lines 46 to 50. In this we are frequently removing references to every element in a large set. Instead, it would be better to remove as few references as possible. Therefore, instead of merging from_net into to_net, we will determine which network is the smaller of the two and marge that one into the larger one. Note that this does not necessarily change the worst case time complexity of the algorithm, but it should make the code fast enough to be useful. The new version appears below.

#!/usr/bin/env python
import json
import sys
import time

class hset(set):
'''A hashable set. Note that it only hashes by the pointer, and not by the elements.'''
def __hash__(self):
return hash(id(self))

def __cmp__(self, other):
return cmp(id(self), id(other))

if __name__ == '__main__':
try:
inputfile = sys.argv[1]
except:
print 'usage: %s network.json' % sys.argv[0]
sys.exit()

print time.asctime(), 'parsing json input'

edge_to_net = {} # Edge ID -> set([edges that are in the same network])
nets = set()     # Set of known networks

print time.asctime(), 'detecting disconnected subgraphs'
for i, (from_edge, to_set) in enumerate(connections.iteritems()):
from_edge = int(from_edge)

try:
from_net = edge_to_net[from_edge]
except KeyError:
from_net = edge_to_net[from_edge] = hset([from_edge])

if not (i+1) % (25 * 1000):
print time.asctime(), '%d edges processed / %d current subnets' % (i+1, len(nets))

for to in to_set:
try:
to_net = edge_to_net[to]

# If we get here, merge the to_net into the from_net.
if to_net is not from_net:
# Update references to and remove the smaller set for speed.
if len(to_net) < len(from_net):
smaller, larger = to_net, from_net
else:
smaller, larger = from_net, to_net

larger.update(smaller)
for e in smaller:
edge_to_net[e] = larger
nets.remove(smaller)
edge_to_net[to] = larger
from_net = larger

except KeyError:
edge_to_net[to] = from_net

print time.asctime(), len(nets), 'subnets found'


Indeed, this is significantly fast. And on very large networks it runs in minutes instead of hours or days. On the small test case used for this post, it runs in under a second. While this could probably be done faster, that's actually good enough for right now.

Mon Jul 29 12:39:55 2013 parsing json input
Mon Jul 29 12:39:55 2013 detecting disconnected subgraphs
Mon Jul 29 12:39:55 2013 25000 edges processed / 1970 current subnets
Mon Jul 29 12:39:55 2013 50000 edges processed / 124 current subnets
Mon Jul 29 12:39:55 2013 60 subnets found


## Wednesday, July 17, 2013

### stack in go

I've been using an increasing amount of Go lately. One of my favorite features is its ability to add methods to any existing type. This is possible for primitive types by simply providing an alias for them, similar to using typedef in C.

Recently, I found I needed a stack container. Nothing complicated, merely an implementation of push, pop, and empty methods. While this is easy enough to do using slices, wrapping the required code makes for a much nicer interface.

package main

import "fmt"

type Element float64
type Stack []Element

func (s *Stack) Push(e Element) {
*s = append(*s, e)
}

func (s *Stack) Pop() Element {
e := (*s)[len(*s)-1]
*s = (*s)[:len(*s)-1]
return e
}

func (s *Stack) Empty() bool {
return len(*s) < 1
}

func main() {
s := Stack([]Element{})

for i := 0; i < 10; i++ {
s.Push(Element(i))
}

for !s.Empty() {
fmt.Println("pop:", s.Pop())
}
}

Since there aren't real classes in Go, or inheritance, this doesn't do anything scary like monkey patching. It's just a bit of syntactic sugar to make our lives better. For instance, the Push method could just as easily look like this to the compiler:

func Push(s *Stack, e Element) {
*s = append(*s, e)
}


## Monday, October 8, 2012

### web slides with python & math using deck.js, syntaxhighlighter, and mathjax

In putting together some slides for class that include both Python code and math models, I realized that it was high time to update the technology stack I'm using. If anyone else finds this useful, please feel free to copy my starting template and use or improve upon it.

I'm hosting my slides directly out of Dropbox since I can do so without uploading them to a web server. It also means I can edit and view them locally without have to go through any sort of manual syncing process. This is possibly the simplest way in existence to share a page on the internet.

Steps:

1. Create the directories /public/, /public/www/, /public/www/slides/, and /public/www/slides/boilerplate/ in your Dropbox folder.
5. Save this template into /public/www/slides/boilerplate/boilerplate.html.

Now to create a new slide deck, copy the boilerplate directory to something new in /public/www/slides/. You should be able to begin with something that looks like this. Search through its source for "slides start" to see how to create slides that show beautifully rendered Python and LaTeX.

Use the left and right arrow keys or the icons that appear when you hover over the left and right of the slides to navigate.

## Monday, September 24, 2012

### roll your own gurobi shell

Note: These instructions have been updated to use the setup.py from the Gurobi distribution.

I've recently started using Gurobi for a class. While I'm pleased with the level of Python support they provide, I'm not a fan of the Gurobi shell they ship it with.

• The shell is Python, which is good. However, instead of bundling the Gurobi libraries as a Python module that I can install alongside my system's native Python, they ship their own Python binaries. If I want to use any third-party modules I have to make them available to Gurobi somehow.
• They appear to have removed distutils from the Python they ship.
• The Gurobi shell doesn't give me module docs, tab completion, or anything I'd get from either ipython or bpython. It's nothing more than the rudimentary cPython shell.

They do, however, provide a setup.py script, so we don't have to use gurobi.sh or the Python that's in the distribution. Here are simple instructions for using bpython as your shell. You should have Gurobi 5.0.1 installed and the GUROBI_HOME environment variable set. (I assume you are installing it into your home directory.)

$mkdir -p$HOME/lib/python2.7/site-packages/
$cd$GUROBI_HOME
$PYTHONPATH=$HOME/lib/python2.7/site-packages/ python setup.py install --prefix=$HOME  Update your environment's PYTHONPATH and LD_LIBRARY_PATH: $ echo "export PYTHONPATH=\$PYTHONPATH:\$HOME/lib/python2.7/site-packages/" >> $HOME/.bashrc$ echo "export LD_LIBRARY_PATH=\$LD_LIBRARY_PATH:\$GUROBI_HOME/lib/" >> $HOME/.bashrc$ source $HOME/.bashrc  Install bpython: $ sudo apt-get install bpython


Now type bpython and you should have the lovely interface you see below, complete with tab completion and module docs. Isn't that better?

## Saturday, September 22, 2012

### lagrangian relaxation with gurobipy

Note: this is still a new technique to me, so please point out any errors and I will correct them in the post.

Correction: $b_{ij}$ is now $b_j$ in the second set of constraints.

We've been studying Lagrangian Relaxation (LR) in the combinatorial optimization course I'm taking this term, and I had some difficulty finding a simple example covering its application. In case anyone else finds it useful, I'm posting a Python version for solving the Generalized Assignment Problem (GAP). This won't discuss the theory of LR at all, just give example code using Gurobi 5.0.1.

The GAP as defined by Wolsey consists of a maximization problem subject to a set of set packing constraints followed by a set of knapsack constraints:

max $\sum_i \sum_j c_{ij} x_{ij}$
subject to $\sum_j x_{ij} \leq 1 \forall i$
$\sum_i a_{ij} x_{ij} \leq b_{ij} \forall j$
$x_{ij} \in \{0, 1\}$

An unadulterated version of this BIP using gurobipy might look like the following:

#!/usr/bin/env python

# This is the GAP per Wolsey, pg 182

from gurobipy import GRB, Model

model = Model('GAP per Wolsey')
model.modelSense = GRB.MAXIMIZE
model.setParam('OutputFlag', False) # turns off solver chatter

b = [15, 15, 15]
c = [
[ 6, 10,  1],
[12, 12,  5],
[15,  4,  3],
[10,  3,  9],
[ 8,  9,  5]
]
a = [
[ 5,  7,  2],
[14,  8,  7],
[10,  6, 12],
[ 8,  4, 15],
[ 6, 12,  5]
]

# x[i][j] = 1 if i is assigned to j
x = []
for i in range(len(c)):
x_i = []
for j in c[i]:
x.append(x_i)

# We have to update the model so it knows about new variables
model.update()

# sum j: x_ij <= 1 for all i
for x_i in x:

# sum i: a_ij * x_ij <= b[j] for all j
for j in range(len(b)):
model.addConstr(sum(a[i][j] * x[i][j] for i in range(len(x))) <= b[j])

# max sum i,j: c_ij * x_ij
model.setObjective(
sum(
sum(c_ij * x_ij for c_ij, x_ij in zip(c_i, x_i))
for c_i, x_i in zip(c, x)
)
)
model.optimize()

# Pull objective and variable values out of model
print 'objective =', model.objVal
print 'x = ['
for x_i in x:
print '   ', [1 if x_ij.x >= 0.5 else 0 for x_ij in x_i]
print ']'


The solver quickly finds the following optimal solution of this toy problem:

objective = 46.0
x = [
[0, 1, 0]
[0, 1, 0]
[1, 0, 0]
[0, 0, 1]
[0, 0, 0]
]


There are two sets of constraints we can dualize. It can be beneficial to apply Lagrangian Relaxation against problems composed of knapsack constraints, so we will dualize the set packing ones. The first thing we do is replace the constraints generated in lines 39-40 with a new set of variables, penalties, which take the values of the slacks on the set packing constraints. We then modify the objective function, adding Lagrangian multipliers times these penalties.

Instead of optimizing once, we do so iteratively. An important consideration is we may get nothing more than a dual bound from this process. Any integer solution is not guaranteed to be primal feasible unless it satisfies complementary slackness conditions -- for each dualized constraint either its multiplier or penalty must be zero.

Setting the intial multiplier values to 2 and using subgradient optimization with a step size of 1 / (iteration #) to adjust them, we arrive at the following model:

#!/usr/bin/env python

# This is the GAP per Wolsey, pg 182, using Lagrangian Relaxation

from gurobipy import GRB, Model

model = Model('GAP per Wolsey with Lagrangian Relaxation')
model.modelSense = GRB.MAXIMIZE
model.setParam('OutputFlag', False) # turns off solver chatter

b = [15, 15, 15]
c = [
[ 6, 10,  1],
[12, 12,  5],
[15,  4,  3],
[10,  3,  9],
[ 8,  9,  5]
]
a = [
[ 5,  7,  2],
[14,  8,  7],
[10,  6, 12],
[ 8,  4, 15],
[ 6, 12,  5]
]

# x[i][j] = 1 if i is assigned to j
x = []
for i in range(len(c)):
x_i = []
for j in c[i]:
x.append(x_i)

# As stated, the GAP has these following constraints. We dualize these into
# penalties instead, using variables so we can easily extract their values.
penalties = [model.addVar() for _ in x]
model.update()

# Dualized constraints: sum j: x_ij <= 1 for all i
for p, x_i in zip(penalties, x):

# sum i: a_ij * x_ij <= b[j] for all j
for j in range(len(b)):
model.addConstr(sum(a[i][j] * x[i][j] for i in range(len(x))) <= b[j])

# u[i] = Lagrangian Multiplier for the set packing contraint i
u = [2.0] * len(x)

# Re-optimize until either we have run a certain number of iterations
# or complementary slackness conditions apply.
for k in range(1, 101):
# max sum i,j: c_ij * x_ij
model.setObjective(
sum(
# Original objective function
sum(c_ij * x_ij for c_ij, x_ij in zip(c_i, x_i))
for c_i, x_i in zip(c, x)
) + sum (
# Penalties for dualized constraints
u_j * p_j for u_j, p_j in zip(u, penalties)
)
)
model.optimize()

print 'iteration', k, 'obj =', model.objVal, \
'u =', u, 'penalties =', [p.x for p in penalties]

# Test for complementary slackness
stop = True
eps = 10e-6
for u_i, p_i in zip(u, penalties):
if abs(u_i) > eps and abs(p_i.x) > eps:
stop = False
break

if stop:
print 'primal feasible & optimal'
break

else:
s = 1.0 / k
for i in range(len(x)):
u[i] = max(u[i] - s*(penalties[i].x), 0.0)

# Pull objective and variable values out of model
print 'objective =', model.objVal
print 'x = ['
for x_i in x:
print '   ', [1 if x_ij.x >= 0.5 else 0 for x_ij in x_i]
print ']'


On this problem LR converges very quickly to an optimal solution, as we can see here:

iteration 1 obj = 48.0 u = [2.0, 2.0, 2.0, 2.0, 2.0] penalties = [0.0, 0.0, 0.0, 0.0, 1.0]
iteration 2 obj = 47.0 u = [2.0, 2.0, 2.0, 2.0, 1.0] penalties = [0.0, 0.0, 0.0, 0.0, 1.0]
iteration 3 obj = 46.5 u = [2.0, 2.0, 2.0, 2.0, 0.5] penalties = [0.0, 0.0, 0.0, 0.0, 1.0]
iteration 4 obj = 46.1666666667 u = [2.0, 2.0, 2.0, 2.0, 0.16666666666666669] penalties = [0.0, 0.0, 0.0, 0.0, 1.0]
iteration 5 obj = 46.0 u = [2.0, 2.0, 2.0, 2.0, 0.0] penalties = [0.0, 0.0, 0.0, 0.0, 1.0]
primal feasible & optimal
objective = 46.0
x = [
[0, 1, 0]
[0, 1, 0]
[1, 0, 0]
[0, 0, 1]
[0, 0, 0]
]


Exercise for the reader: change the script to dualize the knapsack constraints instead of the set packing constraints. What is the result of this change in terms of convergence?

## Thursday, May 31, 2012

### #orms irc channel / cython port

I haven't posted in a while, but the Cython port of python-zibopt is making good progress. I think it will be possible to get a working version within the next few weeks.

In other news, I created a new IRC channel for Operations Research on FreeNode.net called #orms. I'm hopeful that eventually others will join and stay and we can get a somewhat lively discussion of OR going. So join up!

## Thursday, March 29, 2012

### python-zibopt 0.7.2 released

python-zibopt 0.7.2 has been released and the big news is that it supports both Python 2 and 3. (Thanks to Guillaume Sagnol for the patch!)

Other changes include:

• Added magic square and normal magic square examples
• Fixed issue with single-variable objectives subject to constraints:
solver += x >= y
solution = solver.minimize(objective=x)
• Constant objectives no longer raise errors:
solver.minimize(objective=3)
• Fixed obscure constraint issue:
solver += x <= (y-1) + 100*(1-z)`

From here I'm hoping to focus my efforts on the Cython port, which should give a better basis for more advanced functionality in future versions.

## Tuesday, March 27, 2012

### spring coding haiku

Spring priorities
ever shift as with the winds.
Effort flows downstream.

## Monday, March 26, 2012

### a few thoughts on music and computer programming

Greg Linch started a discussion recently on Twitter regarding music and coding. The apex of the conversation was Zed Shaw pointing to his own well-stated comments on the matter.

I got back just over a couple hours ago from a weekly rehearsal, so I've got a number of things I'd like to mention swimming around in my head. Here are a few:

1. To "Linguistic Skills" I'd add that musical notation is specifically built to enable certain domain-specific elements. Its staved nature allows a high degree of parallelism. I can't think of any other common language that succinctly expresses multiple concurrent paths in a readable format. One can glean a great deal of information about independently moving pieces from something like an orchestral score very quickly, in ways that we cannot with programming or other languages. Thinking concurrently is increasingly important in our contemporary computing environment.

2. Musical notation is evolving rapidly. If one studies modern music, one will be constantly presented with new types and forms of instruction. In a way, this is like introducing new op-codes or programming constructs and can help form a desirable malleability of mind. Consider the notation, for example, in Penderecki's "Threnody for the Victims of Hiroshima" or much crazier looking things by composers such as Xenakis. Sometimes learning a single piece requires understanding the structure of an entirely new language.

3. Music encourages team playing. In a chamber ensemble, one is expected to become so in sync with the other members that you do things like breathe together. A well-rehearsed group will practice such things as playing notes slowly to try and achieve better temperament (consistent intonation which should result in more commonly placed overtones) and vibratos that waver at the same rates. Communication is rehearsed in the form of breaths, nods, eye contact, and other physical gestures (chamber players cannot over-communicate). Most chamber music involves a lot of passing back and forth of musical phrases, which can be analogous to message passing in a synchronous computing system.

4. Interpretation of a piece is largely subjective and therefore requires lots of compromise in order to become cohesive. This sort of compromise doesn't just mean allowing someone else to have his or her way, but means accepting and buying into it once a decision is made. If a member of my group plays a passage a certain way, I'm now expected to play that passage in a similar manner. Once a design decision is agreed upon, the impetus is on the players to accept and use it to improve the overall structure.