The traveling salesman problem is a classic optimization problem that seeks to find the most efficient route to connect a given set of points. I recently discovered a set of services built by the open-source mapping company, Mapzen that make this complex problem easy to approximate for a relatively small numbers of stops. Given a set of coordinates, the Mapzen optimize route service uses road network data to produce a time-distance matrix between these points, and then uses an optimization algorithm to determine a route that minimizes total travel time. This can be done for one of three modes of transportation - pedestrian, bicycle, and automobile. They have a great example with a cool map where they determine the optimal route to visit burrito 'dispensaries' in San Francisco.

This optimization tool can also be used in conjunction with Mapzen's Search Service, which uses open source data to geocode addresses, landmarks, and businesses. Using these two services together is really handy, because it means that one can pass a list of addresses or business names rather than lat / long coordinates.

The Mapzen optimize route service takes a set of points and finds the optimal route that a person should take to visit all of these points. However, what if we have multiple "salesmen?” How should the stops be split up between people and in what order should each person visit their stops?

The idea for this was spurred by a project I'm involved with at work, in which we are sending out multiple research assistants to conduct surveys at a dozen or so different sites in Oakland. In this case, it doesn't matter if one person conducts more surveys than the others or who goes to which site - the goal is just to minimize the total time to get them all done.

I decided to test these services in an application that hits closer to home (literally): the optimization of Sunday morning errands between my girlfriend, Celeste, and I. Say we're both starting and ending at our apartment in the Inner Richmond, SF and have 6 different places that we need to stop at. How should Celeste and I split up these errands so that we're finished as quickly as possible?

In the post below, I write a set of functions in Python that call the Mapzen Search and Mapzen Optimize Route API services, making it really easy to determine the most efficient route between a given set of locations. I also extend this function to determine the most efficient way to split the stops among multiple salesmen, and which route each person should take.

Additionally, I use the Python package `folium`

to create leaflet.js slippy maps to display the results of this optimization problem. Folium has a number of built-in tilesets from OpenStreetMap, MapQuest, MapBox, and makes it really easy to build web maps using Python. I find this package to be really useful for data visualization – since I tend to generate data in Python anyway it’s nice to be able to do it all within one work environment.

The purpose of this blog post is NOT to develop an efficient algorithm to approximate a solution to the vehicle routing problem at any large scale. It is instead to show-case an easy way to access and visualize Mapzen routing services, which I find to be very useful and fun to work with. This post only scratches the surface of the capabilities and range of these routing tools. My use of this service to optimize routes among multiple salesmen addresses a specific aspect of this service that they do not provide, but my approach is more of a 'quick and dirty' demonstration than anything else.

The number of unique ways that stops can be partitioned grows very quickly as the number of stops and salesmen increases, so my approach of testing these unique combinations using Mapzen’s optimize route tool will only work for relatively small numbers before exceeding API service limits. For my purposes right now, this is just fine. A better approach would perhaps be to address this problem more “up-stream”, using the Mapzen generated Time-Distance Matrix rather than the results from a tool that uses that matrix behind the scenes anyway. However, my approach provides a relatively simple way to get an answer without having to develop my own traveling salesman optimization algorithm.

Anyway, let’s get started!

In [1]:

```
import requests
import pandas as pd
import itertools
import shapely
from shapely.geometry import Point
import geopandas as gpd
import json
import numpy as np
import folium
from numpy.random import RandomState, uniform
import time
import sys
import os
%matplotlib inline
search_key=os.getenv('MAPZEN_SEARCH_KEY')
matrix_key=os.getenv('MAPZEN_MATRIX_KEY')
```

I first write a function that wraps Mapzen's Search tool in order to geocode (obtain the lat / long coordinates for) an address, landmark, or business. The function returns a dictionary containing the raw result output from Mapzen (including information such as data source, geocoding confidence, neighborhood data, etc), as well as the crucial piece of information for me: a formatted set of coordinates. By default, I return only 1 result (the best match), but if I were to use this tool in a different context, I would perhaps be interested in returning a set of results. Below I use this function to geocode two famous San Francisco landmarks - the Transamerica Pyramid and Sutro Tower.

In [2]:

```
def geocode_address_venue(text, key = search_key,params = None): #allow for additional Mapzen search parameters
search_parameters = {'api_key': key,'text': text,'size': 1,'layers': 'address,venue'}
if params:
search_parameters.update(params)
url = 'http://search.mapzen.com/v1/search'
r = requests.get(url, params = search_parameters)
data = r.json()
return {'raw': data,'coords': tuple(data['features'][0]['geometry']['coordinates'])}
```

In [3]:

```
transamerica = geocode_address_venue('Transamerica Pyramid, San Francisco, CA')['coords']
sutro = geocode_address_venue('Sutro Tower, San Francisco, CA')['coords']
print transamerica
print sutro
```

Now let's put these points on a map. I wrote a plotting function that uses folium to plot a set of points and polylines with labels on leaflet.js maps. By default, I use the Stamen Watercolor tiles because I think they look really cool, but if something like OpenStreetMap is more useful, that can also be specified. The full set of available tile layers can be found here. The function also takes an optional set of colors and point labels, a zoom-level, and a center location. My function calculates default values if not specified.

The function is written in a way that it accepts either a list of point coordinates, or a list of lists of point coordinates. If the latter is specified, each sublist is treated as a group of coordinates and will be symbolized in the same color. The function was ultimately written this way to distinguish the routes of each salesman in the optimization problem coming later. Additionally, I allow for the specification of polyline coordinates, which can be used to plot the optimized route between the points.

Below I map my two previously geocoded San Francisco landmarks along with the "as the crow flies" polyline that connects them. Note that on the 'live' map if you click on the points you will see that they are labeled appropriately.

In [4]:

```
def plot_stops(point_coords, zoom_level = 15,tiles = 'Stamen Watercolor', point_colors = None,
labels = None, center_location = None, line_coords = None, line_colors = None):
#if not a list of lists, make it one, which then allows the function to handle either
point_coords = [[x] for x in point_coords] if all([type(x) is not list for x in point_coords]) else point_coords
if labels:
labels = [[x] for x in labels] if all([type(x) is not list for x in labels]) else labels
if line_coords:
line_coords = [line_coords] if all([type(t) is tuple for t in line_coords]) else line_coords
if line_colors:
line_colors = [line_colors] if type(line_colors) is not list else line_colors
#get all points as the flattened list to calculate the center
all_points = [item for sublist in point_coords for item in sublist]
#calculate start location as the mean x and mean y of all input points
if center_location:
center_location = [center_location[1],center_location[0]]
else:
center_location = np.array(all_points).mean(0).tolist()[::-1]
#create a leaflet map, specifying center location, zoom level, and tiles
map_1 = folium.Map(location = center_location,zoom_start = zoom_level,tiles = tiles)
#specify a set of default colors
color_options = ['black', 'blue', 'red', 'green', 'purple', 'orange', 'pink', 'white']
#if no point colors specified use the default
point_colors = point_colors if point_colors else color_options[:len(point_coords)]
#if plotting lines as well, get line colors, otherwise use default
if line_coords:
line_colors = line_colors if line_colors else color_options[1:][:len(line_coords)]
#loop through each point or grouping of points
for c, point_coords in enumerate(point_coords):
point_color = point_colors[c] #get color from color list
if labels: sublabel_list = labels[c] #get labels from label list
for i, stop_coord in enumerate(point_coords):
label = sublabel_list[i] if labels else None
#Add point to map at specified coordinate
folium.Marker([stop_coord[1], stop_coord[0]], popup = label, #need to reverse long/lat to lat/long
icon = folium.Icon(color = point_color,icon='mapmarker')).add_to(map_1)
#if plotting lines, loop through each set of lines and plot
if line_coords:
for i, pline in enumerate(line_coords):
line_color = line_colors[i]
folium.PolyLine([(y,x) for (x,y) in pline], color = line_color).add_to(map_1) #need to reverse long/lat to lat/long
return map_1
```

In [5]:

```
sf_landmarks = plot_stops([transamerica, sutro], labels = ['Transamerica Pyramid','Sutro Tower'],
zoom_level = 13, tiles = 'Stamen Toner', point_colors = ['red','blue'],
line_coords = [transamerica, sutro], line_colors = 'purple')
sf_landmarks
```

Out[5]:

In [6]:

```
def gen_random_points(poly, n, random_seed = None):
xmin, ymin, xmax, ymax = poly.bounds
Points = []
i = 0
while len(Points) <= n:
if random_seed:
x, y = RandomState(random_seed + i).uniform(xmin, xmax), RandomState(random_seed + i + 1).uniform(ymin, ymax)
else:
x, y = uniform(xmin, xmax),uniform(ymin, ymax)
if Point(x, y).within(poly):
Points.append((x, y))
i += 1
return Points
```

In [7]:

```
SF = gpd.read_file('SF.geojson').iloc[0]['geometry']
rand_points = [gen_random_points(SF, n = 5) for t in range(6)]
labels = [[str((t, i)) for i, x in enumerate(l, 1)] for t, l in enumerate(rand_points, 1)]
sf_rand_points = plot_stops(rand_points, zoom_level = 12, tiles = 'OpenStreetMap', labels = labels)
sf_rand_points
```

Out[7]:

Now that I have tools to geocode and visualize points, I’m ready to write a function that calls the Mapzen Optimize route service to calculate the optimal route that one person should use to visit a set of stops, starting and ending at a “home” location. Later on, I will build off of this function to optimize stops for multiple people, but this is an initial building-block. The key pieces of information that this function returns are trip length, trip distance, optimize stop order, an ordered set of stop coordinates, and the polyline that connects these stops.

Note that Mapzen uses the Google Maps encoded polyline format to store a series of latitude, longitude coordinates as a single string (this is done to reduce the size of the route response). Mapzen provides code that can be used to decode the string, which I use in order to get a set of lat / long coordinates that I can plot.

The decoding function is listed below, applied to a sample coded string, and then mapped.

In [8]:

```
#Decoding function used to extract lat/long coordinates from google maps encoded polylines
#https://mapzen.com/documentation/mobility/decoding/
#six degrees of precision in valhalla
inv = 1.0 / 1e6;
#decode an encoded string
def decode(encoded):
decoded = []
previous = [0,0]
i = 0
#for each byte
while i < len(encoded):
#for each coord (lat, lon)
ll = [0, 0]
for j in [0, 1]:
shift = 0
byte = 0x20
#keep decoding bytes until you have this coord
while byte >= 0x20:
byte = ord(encoded[i]) - 63
i += 1
ll[j] |= (byte & 0x1f) << shift
shift += 5
#get the final value adding the previous offset and remember it for the next
ll[j] = previous[j] + (~(ll[j] >> 1) if ll[j] & 1 else (ll[j] >> 1))
previous[j] = ll[j]
#scale by the precision and chop off long coords also flip the positions so
#its the far more standard lon,lat instead of lat,lon
decoded.append([float('%.6f' % (ll[1] * inv)), float('%.6f' % (ll[0] * inv))])
#hand back the list of coordinates
return decoded
```

In [9]:

```
coded_polyline=decode('ciz`gAzxrqhF}wArEkZ|@`BvaA^|TbAhl@\\|T|@xk@`BvaAbBfbAsFLaRzA')
coded_polyline
```

Out[9]:

In [10]:

```
map2 = folium.Map(location = (coded_polyline[5][1],coded_polyline[5][0]),zoom_start =17)
folium.PolyLine([(y,x) for (x,y) in coded_polyline],color='red').add_to(map2)
map2
```

Out[10]:

In [11]:

```
def optimize_stops(home, stops, costing = 'pedestrian', api_key = matrix_key, home_label = 'Home',
stop_labels = None):
#geocode home and stops if not a coordinate
home = home if type(home) is tuple else geocode_address_venue(home)
stops = [geocode_address_venue(stop)['coords'] if type(stop) is not tuple else stop for stop in stops]
#full set of points are list of points that start and end with the home location
points = [home] + stops + [home]
#round_points to 5 decimal points
points=[(round(x,5),round(y,5)) for (x,y) in points]
#define point labels
names = [home_label] + (stop_labels if stop_labels else stops) + [home_label]
#set up parameters to pass to mapzen function
js = {'locations':[{'lon': point[0], 'lat': point[1]} for point in points], 'costing': costing}
params = {'json': json.dumps(js), 'api_key': api_key}
url = 'https://matrix.mapzen.com/optimized_route'
r = requests.get(url, params = params)
raw = r.json()
#get the coordinates of the stops in their new optimized order
locs = raw['trip']['locations']
new_point_order = [(locs[loc]['lon'], locs[loc]['lat']) for loc in range(len(locs))]
#round to 5 decimal points
new_point_order=[(round(x,5),round(y,5)) for (x,y) in new_point_order]
point_order = [points.index(x) for x in new_point_order]
name_order = [names[i] for i in point_order]
#Extract the path shape for each of the legs, decode them, and then combine into one polyline
raw_path = [raw['trip']['legs'][x]['shape'] for x in range(len(raw['trip']['legs']))]
decode_paths = [tuple(item) for sublist in [decode(z) for z in raw_path] for item in sublist]
#return a dictionary that contains raw mapzen output, trip time, trip distance, new ordered points,
#new ordered point names, and a decoded path
return {'raw': raw,
'time': raw['trip']['summary']['time']/60,
'length': raw['trip']['summary']['length'],
'points': new_point_order,
'original_points':points,
'order': name_order,
'line': decode_paths}
return point_order
```

Before extending this function to multiple salesman, I’ll demonstrate the output of this initial function. Below I define my optimization parameters:

- A list of 6 stops in my neighborhood that Celeste and I will need to visit as part of our Sunday morning errands (note that I use a combination of business names, addresses, and coordinates)
- A set of names that correspond to these stops that will make the labeling clearer
- A home location that will be the start and end point.

As much as I wish it were the case, our Sunday errands generally do not involve ice cream, pizza, bagels, and art museums. Nonetheless, for this example I've used some of our favorite neighborhood destinations despite the fact that they are somewhat unrealistic “errand” destinations.

In [12]:

```
stops = ['Toy Boat Dessert Cafe, San Francisco, CA',
'Pizzetta 211, San Francisco, CA',
'Arguello Super Market, San Francisco, CA',
'4700 Geary Blvd, San Francisco, CA',
(-122.465613, 37.770016),
'3519 California St, San Francisco, CA 94118']
stop_labels = ['Toy Boat',
'Pizzetta',
'Arguello Market',
'Lamps Plus',
'de Young Museum',
"Noah's Bagels"]
home = (-122.464186, 37.779111)
```

In [13]:

```
walk_opt = optimize_stops(home, stops, stop_labels = stop_labels)
print walk_opt['order']
print str(walk_opt['time']) + ' minutes'
print str(walk_opt['length']) + ' km'
```

In [14]:

```
walk_1_map=plot_stops(walk_opt['points'][:-1], zoom_level = 14, tiles = 'Stamen Watercolor', \
labels = walk_opt['order'][:-1], line_coords = walk_opt['line'], point_colors=['black']+len(stops)*['red'])
walk_1_map
```

Out[14]:

In [15]:

```
bike_opt = optimize_stops(home, stops, stop_labels = stop_labels,costing = 'bicycle')
print bike_opt['order']
print str(bike_opt['time']) + ' minutes'
print str(bike_opt['length']) + ' km'
```

In [16]:

```
bike_1_map=plot_stops(bike_opt['points'][:-1], zoom_level = 14, tiles = 'Stamen Watercolor', \
labels = bike_opt['order'][:-1], line_coords = bike_opt['line'], point_colors=['black']+len(stops)*['red'])
bike_1_map
```

Out[16]:

Now that I have a working function that wraps Mapzen's optimize route service, I am ready to extend it to work with multiple people. The general approach I take is to first find the unique combinations that a list of stops can be split among a given number of people, and then determine which of these combinations minimizes the maximum time of any one person.

Note that there are many different minimization criteria that can be used when optimizing a set of routes between multiple people. In this case, I’m assuming that Celeste and I start our errands at the same time and that they are completed when the last person is done. The length of time until all the errands are done (the max time of any person) is what I am trying to minimize. However, a more typical usage would perhaps be to minimize the sum or cumulative amount of time that it takes for all of the salesman to visit the stops. Nonetheless, the function can pretty easily be tweaked for any optimization criteria.

I use a function adapted from here to find the unique ways in that a list of N elements can be partitioned into K groups. This function is written so that the order of groups or of elements within a group does not matter, as this is what the optimize route tool is going to determine! By this I mean that in terms of the input, `[['A','B'],['C','D']]`

is considered identical to `[['C','D'],['A','B']]`

as well as to `[['B','A'],['C','D']]`

.

In [17]:

```
def sorted_k_partitions(seq, k):
n = len(seq)
working_partition = []
def generate_partitions(i):
if i >= n:
yield list(map(tuple, working_partition))
else:
if n - i > k - len(working_partition):
for part in working_partition:
part.append(seq[i])
for bar in generate_partitions(i + 1):
yield bar
part.pop()
if len(working_partition) < k:
working_partition.append([seq[i]])
for bar in generate_partitions(i + 1):
yield bar
working_partition.pop()
result = generate_partitions(0)
# Sort the parts in each partition in shortlex order and then by the length of each part,
#and then lexicographically
result = [sorted(ps, key = lambda p: (len(p), p)) for ps in result]
result = sorted(result, key = lambda ps: (map(len, ps), ps))
return result
```

In [18]:

```
for c in sorted_k_partitions(['A', 'B', 'C', 'D'], 2):
print c
```

In [19]:

```
num_test=6
nk_matrix=pd.DataFrame(index=range(1,num_test+1),columns=pd.MultiIndex.from_product([range(1,num_test+1),['C','R']]))
for n in range(1,num_test+1):
for k in range(1,num_test+1):
combos=sorted_k_partitions(range(1,k+1),n)
nk_matrix.loc[n,(k,'C')]=len(combos)
nk_matrix.loc[n,(k,'R')]=len([item for sublist in combos for item in sublist])
nk_matrix.index.name='# of Salesman'
nk_matrix.columns.names=['# of Stops',None]
nk_matrix
```

Out[19]:

I then write a function that wraps my single-person route optimization function. The function applies the original function to each unique way that the stops can be partitioned, and determines the partition that minimizes the maximum time of the travelers. The input is identical to the original function except for the ability to specify the number of salesmen. The output too, is nearly identical to the previous function returning trip length, optimized stop order, etc. However, now for each of these pieces of information, the function returns a list of values, where the length of the list is equal to the number of salesmen.

In [20]:

```
def optimize_stops_mult(home, stops, num_travelers, costing = 'pedestrian', api_key = matrix_key, home_label = 'Home',
stop_labels = None):
#get all possible options in which the stops can be broken up between the specified number of travelers
options = sorted_k_partitions(stops, num_travelers)
#create lists to store sublists of the time, name order, and point order for each option and traveler
all_times = []
all_orders = []
all_points = []
all_lines = []
for option in options: #loop through each possible way to split the stops
#create lists to store the time, name order, and point order for each traveler
option_times = []
option_orders = []
option_points = []
option_lines = []
#calculate the optimal stop order for each traveler with each set of stop options
#using the previously defined function
#append relevant information to lists
for traveler in option:
sub_labels = [stop_labels[stops.index(x)] for x in traveler] if stop_labels else None
result = optimize_stops(home = home, stops = list(traveler), costing = costing, api_key = api_key,
home_label = home_label, stop_labels = sub_labels)
option_times.append(result['time'])
option_orders.append(result['order'])
option_points.append(result['points'])
option_lines.append(result['line'])
all_times.append(option_times)
all_orders.append(option_orders)
all_points.append(option_points)
all_lines.append(option_lines)
#get the index of the option that minimizes the max of any time that a traveler takes
minloc = np.argmin([max(time) for time in all_times])
#return a dictionary of the stop order, points, and time for the optimized options
opt_order = [x[1: -1] for x in all_orders[minloc]]
opt_times = all_times[minloc]
opt_points = [x[1: -1] for x in all_points[minloc]]
home_point = [x[0] for x in all_points[minloc]][0]
polylines = all_lines[minloc]
return {'order': opt_order, 'time': opt_times, 'points': opt_points, 'home_point': home_point, 'lines':polylines}
```

I use the same set of 6 stops in the Inner Richmond that I used earlier, but now specify that there will be 2 travelers. I extract the ordered set of stops for each of the 2 travelers and the travel time for each, and then plot the stops and routes on a leaflet map.

As you can see, the stop order is returned as a list of two ordered sublists, indicating that the first person should go to Pizzetta and Lamps Plus and the second person should go to Toy Boat, Noah's Bagels, Arguello, Market, and the de Young Museum (in that order). One person is going to less stops than the other, but some of the stops are also much further from home. The person that makes the two stop trip will spend 53 minutes and the person that makes the four stop trip will spend 68 minutes. 68 minutes is the minimum amount of time that the person with the longer trip takes in any combination ways that these stops can be partitioned.

In the map, the home location is shown in black, and each person's stops are shown in a different color. Stops are labeled with the stop name as well as the stop number (showing the order that the stops should be visited by each person).

In [21]:

```
opt_2_people_walk = optimize_stops_mult(home, stops, num_travelers = 2, stop_labels = stop_labels)
```

In [22]:

```
for i in opt_2_people_walk['order']:
print i
```

In [23]:

```
opt_2_people_walk['time']
```

Out[23]:

In [24]:

```
points = [[opt_2_people_walk['home_point']]] + opt_2_people_walk['points']
labels = [['Home']] + [[' - '.join((str(i), x)) for i, x in enumerate(l, 1)] for l in opt_2_people_walk['order']]
m1 = plot_stops(points, zoom_level = 14, tiles = 'Stamen Watercolor', labels = labels, line_coords = opt_2_people_walk['lines'])
m1
```

Out[24]:

In [25]:

```
opt_2_people_bike = optimize_stops_mult(home, stops, num_travelers = 2, stop_labels = stop_labels, costing = 'bicycle')
print opt_2_people_bike['time']
```

In [26]:

```
points = [[opt_2_people_bike['home_point']]] + opt_2_people_bike['points']
labels = [['Home']]+[[' - '.join((str(i), x)) for i, x in enumerate(l, 1)] for l in opt_2_people_bike['order']]
m2 = plot_stops(points, zoom_level = 14, tiles = 'Stamen Watercolor', labels = labels, line_coords = opt_2_people_bike['lines'])
m2
```

Out[26]:

In [27]:

```
opt_2_people_drive = optimize_stops_mult(home, stops, num_travelers = 2, stop_labels = stop_labels, costing = 'auto')
print opt_2_people_drive['time']
```

In [28]:

```
points = [[opt_2_people_drive['home_point']]] + opt_2_people_drive['points']
labels = [['Home']]+[[' - '.join((str(i), x)) for i, x in enumerate(l, 1)] for l in opt_2_people_drive['order']]
m3 = plot_stops(points, zoom_level = 14, tiles = 'Stamen Watercolor', labels = labels, line_coords = opt_2_people_drive['lines'])
m3
```

Out[28]:

In [29]:

```
opt_3_people = optimize_stops_mult(home, stops, 3, stop_labels = stop_labels)
```

In [30]:

```
for i in opt_3_people['order']:
print i
```

In [31]:

```
opt_3_people['time']
```

Out[31]:

In [32]:

```
points = [[opt_3_people['home_point']]] + opt_3_people['points']
labels = [['Home']] + [[' - '.join((str(i), x)) for i, x in enumerate(l, 1)] for l in opt_3_people['order']]
m4 = plot_stops(points, zoom_level = 14, tiles = 'Stamen Watercolor', labels = labels,line_coords = opt_3_people['lines'])
m4
```

Out[32]:

In [33]:

```
sf_landmarks.save('sutro_transamerica.html')
sf_rand_points.save('sf_random_points.html')
walk_1_map.save('optimize_stops_1_person_walk.html')
bike_1_map.save('optimize_stops_1_person_bike.html')
m1.save('optimize_stops_2_people_walk.html')
m2.save('optimize_stops_2_people_bike.html')
m3.save('optimize_stops_2_people_drive.html')
m4.save('optimize_stops_3_people_walk.html')
```

I found working with these Mapzen services in Python to be really interesting and enjoyable. As I mentioned, this exercise was done mainly as a way to become familiar with some of their routing and geocoding tools, as well as to develop a way to map the geospatial outputs from these tools. My “extension” to the service was done as a quick and dirty way to obtain and visualize an answer for the vehicle routing problem that works for a small number of stops. A next step would be to look into the optimization algorithm that Mapzen uses on the time-distance matrix, so that it can be adapted to work more generally for multiple salesman. This would address the problem further upstream and require many fewer calls to the API.