Optimizing Routes Using Mapzen Services
Note that this post was also featured on Mapzen’s own blog in June 2017. It can be viewed on their website here.
Traveling Salesman Problem
The traveling salesman problem is a classic optimization problem that seeks to find the most efficient route to connect a given set of points. I recently discovered a set of services built by the open-source mapping company, Mapzen that make this complex problem easy to approximate for a relatively small numbers of stops. Given a set of coordinates, the Mapzen optimize route service uses road network data to produce a time-distance matrix between these points, and then uses an optimization algorithm to determine a route that minimizes total travel time. This can be done for one of three modes of transportation - pedestrian, bicycle, and automobile. They have a great example with a cool map where they determine the optimal route to visit burrito ‘dispensaries’ in San Francisco.
This optimization tool can also be used in conjunction with Mapzen’s Search Service, which uses open source data to geocode addresses, landmarks, and businesses. Using these two services together is really handy, because it means that one can pass a list of addresses or business names rather than lat / long coordinates.
Building Off the Optimize Route Service
The Mapzen optimize route service takes a set of points and finds the optimal route that a person should take to visit all of these points. However, what if we have multiple “salesmen?” How should the stops be split up between people and in what order should each person visit their stops?
The idea for this was spurred by a project I’m involved with at work, in which we are sending out multiple research assistants to conduct surveys at a dozen or so different sites in Oakland. In this case, it doesn’t matter if one person conducts more surveys than the others or who goes to which site - the goal is just to minimize the total time to get them all done.
I decided to test these services in an application that hits closer to home (literally): the optimization of Sunday morning errands between my girlfriend, Celeste, and I. Say we’re both starting and ending at our apartment in the Inner Richmond, SF and have 6 different places that we need to stop at. How should Celeste and I split up these errands so that we’re finished as quickly as possible?
In the post below, I write a set of functions in Python that call the Mapzen Search and Mapzen Optimize Route API services, making it really easy to determine the most efficient route between a given set of locations. I also extend this function to determine the most efficient way to split the stops among multiple salesmen, and which route each person should take.
Additionally, I use the Python package
folium to create leaflet.js slippy maps to display the results of this optimization problem. Folium has a number of built-in tilesets from OpenStreetMap, MapQuest, MapBox, and makes it really easy to build web maps using Python. I find this package to be really useful for data visualization – since I tend to generate data in Python anyway it’s nice to be able to do it all within one work environment.
A Few Caveats
The purpose of this blog post is NOT to develop an efficient algorithm to approximate a solution to the vehicle routing problem at any large scale. It is instead to show-case an easy way to access and visualize Mapzen routing services, which I find to be very useful and fun to work with. This post only scratches the surface of the capabilities and range of these routing tools. My use of this service to optimize routes among multiple salesmen addresses a specific aspect of this service that they do not provide, but my approach is more of a ‘quick and dirty’ demonstration than anything else.
The number of unique ways that stops can be partitioned grows very quickly as the number of stops and salesmen increases, so my approach of testing these unique combinations using Mapzen’s optimize route tool will only work for relatively small numbers before exceeding API service limits. For my purposes right now, this is just fine. A better approach would perhaps be to address this problem more “up-stream”, using the Mapzen generated Time-Distance Matrix rather than the results from a tool that uses that matrix behind the scenes anyway. However, my approach provides a relatively simple way to get an answer without having to develop my own traveling salesman optimization algorithm.
Geocoding Locations with Mapzen’s Search Tool
I first write a function that wraps Mapzen’s Search tool in order to geocode (obtain the lat / long coordinates for) an address, landmark, or business. The function returns a dictionary containing the raw result output from Mapzen (including information such as data source, geocoding confidence, neighborhood data, etc), as well as the crucial piece of information for me: a formatted set of coordinates. By default, I return only 1 result (the best match), but if I were to use this tool in a different context, I would perhaps be interested in returning a set of results. Below I use this function to geocode two famous San Francisco landmarks - the Transamerica Pyramid and Sutro Tower.
geocode_address_venue('Transamerica Pyramid, San Francisco, CA')['coords'] geocode_address_venue('Sutro Tower, San Francisco, CA')['coords']
(-122.40303, 37.79465) (-122.45285, 37.755246)
Displaying Points on Leaflet Maps
Now let’s put these points on a map. I wrote a plotting function that uses folium to plot a set of points and polylines with labels on leaflet.js maps. By default, I use the Stamen Watercolor tiles because I think they look really cool, but if something like OpenStreetMap is more useful, that can also be specified. The full set of available tile layers can be found here. The function also takes an optional set of colors and point labels, a zoom-level, and a center location. My function calculates default values if not specified.
The function is written in a way that it accepts either a list of point coordinates, or a list of lists of point coordinates. If the latter is specified, each sublist is treated as a group of coordinates and will be symbolized in the same color. The function was ultimately written this way to distinguish the routes of each salesman in the optimization problem coming later. Additionally, I allow for the specification of polyline coordinates, which can be used to plot the optimized route between the points.
Below I map my two previously geocoded San Francisco landmarks along with the “as the crow flies” polyline that connects them. Note that on the ‘live’ map if you click on the points you will see that they are labeled appropriately.
sf_landmarks = plot_stops([transamerica, sutro], labels = ['Transamerica Pyramid','Sutro Tower'], zoom_level = 13, tiles = 'Stamen Toner', point_colors = ['red','blue'], line_coords = [transamerica, sutro], line_colors = 'purple') sf_landmarks
Determining Stop Order using Mapzen’s Optimize Route Service
Now that I have tools to geocode and visualize points, I’m ready to write a function that calls the Mapzen Optimize route service to calculate the optimal route that one person should use to visit a set of stops, starting and ending at a “home” location. Later on, I will build off of this function to optimize stops for multiple people, but this is an initial building-block. The key pieces of information that this function returns are trip length, trip distance, optimize stop order, an ordered set of stop coordinates, and the polyline that connects these stops.
Note that Mapzen uses the Google Maps encoded polyline format to store a series of latitude, longitude coordinates as a single string (this is done to reduce the size of the route response). Mapzen provides code that can be used to decode the string, which I use in order to get a set of lat / long coordinates that I can plot.
The decoding function is applied below to a sample coded string below.
[[-122.464158, 37.779106], [-122.464264, 37.780529], [-122.464295, 37.780967], [-122.465363, 37.780918], [-122.465714, 37.780902], [-122.466439, 37.780868], [-122.46679, 37.780853], [-122.467507, 37.780822], [-122.468575, 37.780773], [-122.469651, 37.780723], [-122.469658, 37.780845], [-122.469704, 37.78115]]
My function also wraps the geocoding function I wrote above, so that the user can specify any combination of addresses, venues, and coordinates. It defaults to pedestrian mode of transportation, but the user can also specify driving or biking optimization. I also allow the user to specify a set of stop labels, which will be returned as the ordered list of stops. Otherwise, the function defaults to the raw input that was passed to the function
Before extending this function to multiple salesman, I’ll demonstrate the output of this initial function. Below I define my optimization parameters:
- A list of 6 stops in my neighborhood that Celeste and I will need to visit as part of our Sunday morning errands (note that I use a combination of business names, addresses, and coordinates)
- A set of names that correspond to these stops that will make the labeling clearer
- A home location that will be the start and end point.
As much as I wish it were the case, our Sunday errands generally do not involve ice cream, pizza, bagels, and art museums. Nonetheless, for this example I’ve used some of our favorite neighborhood destinations despite the fact that they are somewhat unrealistic “errand” destinations.
stops = ['Toy Boat Dessert Cafe, San Francisco, CA', 'Pizzetta 211, San Francisco, CA', 'Arguello Super Market, San Francisco, CA', '4700 Geary Blvd, San Francisco, CA', (-122.465613, 37.770016), '3519 California St, San Francisco, CA 94118'] stop_labels = ['Toy Boat', 'Pizzetta', 'Arguello Market', 'Lamps Plus', 'de Young Museum', "Noah's Bagels"] home = (-122.464186, 37.779111)
I then apply this function to my set of stops and return the optimized stop order, the route time, and the route distance. I first optimize the route as a pedestrian and then as a bicyclist. As you can see, the stop order is slightly different for these two modes of transit, likely keeping the bike route on roads and paths that are better for biking.
walk_opt = optimize_stops(home, stops, stop_labels = stop_labels) print walk_opt['order'] print str(walk_opt['time']) + ' minutes' print str(walk_opt['length']) + ' km'
['Home', 'Lamps Plus', 'Pizzetta', 'Toy Boat', "Noah's Bagels", 'Arguello Market', 'de Young Museum', 'Home'] 116 minutes 9.625 km
walk_1_map=plot_stops(walk_opt['points'][:-1], zoom_level = 14, tiles = 'Stamen Watercolor', \ labels = walk_opt['order'][:-1], line_coords = walk_opt['line'], point_colors=['black']+len(stops)*['red']) walk_1_map
bike_opt = optimize_stops(home, stops, stop_labels = stop_labels,costing = 'bicycle') print bike_opt['order'] print str(bike_opt['time']) + ' minutes' print str(bike_opt['length']) + ' km'
['Home', 'Pizzetta', 'Lamps Plus', 'Toy Boat', "Noah's Bagels", 'Arguello Market', 'de Young Museum', 'Home'] 29 minutes 11.097 km
bike_1_map=plot_stops(bike_opt['points'][:-1], zoom_level = 14, tiles = 'Stamen Watercolor', \ labels = bike_opt['order'][:-1], line_coords = bike_opt['line'], point_colors=['black']+len(stops)*['red']) bike_1_map
Determining Stops and Order with Multiple People
Now that I have a working function that wraps Mapzen’s optimize route service, I am ready to extend it to work with multiple people. The general approach I take is to first find the unique combinations that a list of stops can be split among a given number of people, and then determine which of these combinations minimizes the maximum time of any one person.
Note that there are many different minimization criteria that can be used when optimizing a set of routes between multiple people. In this case, I’m assuming that Celeste and I start our errands at the same time and that they are completed when the last person is done. The length of time until all the errands are done (the max time of any person) is what I am trying to minimize. However, a more typical usage would perhaps be to minimize the sum or cumulative amount of time that it takes for all of the salesman to visit the stops. Nonetheless, the function can pretty easily be tweaked for any optimization criteria.
Unique ways that stops can be split
I use a function adapted from here to find the unique ways in that a list of N elements can be partitioned into K groups. This function is written so that the order of groups or of elements within a group does not matter, as this is what the optimize route tool is going to determine! By this I mean that in terms of the input,
[['A','B'],['C','D']] is considered identical to
[['C','D'],['A','B']] as well as to
For example, here the 7 unique ways that 4 stops can be split among 2 salesmen:
for c in sorted_k_partitions(['A', 'B', 'C', 'D'], 2): print c
[('A',), ('B', 'C', 'D')] [('B',), ('A', 'C', 'D')] [('C',), ('A', 'B', 'D')] [('D',), ('A', 'B', 'C')] [('A', 'B'), ('C', 'D')] [('A', 'C'), ('B', 'D')] [('A', 'D'), ('B', 'C')]
Optimizing Stops Among K People
I then write a function that wraps my single-person route optimization function. The function applies the original function to each unique way that the stops can be partitioned, and determines the partition that minimizes the maximum time of the travelers. The input is identical to the original function except for the ability to specify the number of salesmen. The output too, is nearly identical to the previous function returning trip length, optimized stop order, etc. However, now for each of these pieces of information, the function returns a list of values, where the length of the list is equal to the number of salesmen.
I use the same set of 6 stops in the Inner Richmond that I used earlier, but now specify that there will be 2 travelers. I extract the ordered set of stops for each of the 2 travelers and the travel time for each, and then plot the stops and routes on a leaflet map.
As you can see, the stop order is returned as a list of two ordered sublists, indicating that the first person should go to Pizzetta and Lamps Plus and the second person should go to Toy Boat, Noah’s Bagels, Arguello, Market, and the de Young Museum (in that order). One person is going to less stops than the other, but some of the stops are also much further from home. The person that makes the two stop trip will spend 53 minutes and the person that makes the four stop trip will spend 68 minutes. 68 minutes is the minimum amount of time that the person with the longer trip takes in any combination ways that these stops can be partitioned.
In the map, the home location is shown in black, and each person’s stops are shown in a different color. Stops are labeled with the stop name as well as the stop number (showing the order that the stops should be visited by each person).
opt_2_people_walk = optimize_stops_mult(home, stops, num_travelers = 2, stop_labels = stop_labels)
for i in opt_2_people_walk['order']: print i
['Lamps Plus', 'Pizzetta'] ['Toy Boat', "Noah's Bagels", 'Arguello Market', 'de Young Museum']
points = [[opt_2_people_walk['home_point']]] + opt_2_people_walk['points'] labels = [['Home']] + [[' - '.join((str(i), x)) for i, x in enumerate(l, 1)] for l in opt_2_people_walk['order']] m1 = plot_stops(points, zoom_level = 14, tiles = 'Stamen Watercolor', labels = labels, line_coords = opt_2_people_walk['lines']) m1
I also run the function on the same set of stops twice more, specifying bike transportation and then car transportation. As you can see the routes are slightly different, favoring roads that are better for biking or for driving. The times are also much shorter although biking and driving times are actually quite similar. Given how difficult it is to find parking in SF, biking is definitely the way to go here!
opt_2_people_bike = optimize_stops_mult(home, stops, num_travelers = 2, stop_labels = stop_labels, costing = 'bicycle') print opt_2_people_bike['time']
points = [[opt_2_people_bike['home_point']]] + opt_2_people_bike['points'] labels = [['Home']]+[[' - '.join((str(i), x)) for i, x in enumerate(l, 1)] for l in opt_2_people_bike['order']] m2 = plot_stops(points, zoom_level = 14, tiles = 'Stamen Watercolor', labels = labels, line_coords = opt_2_people_bike['lines']) m2
opt_2_people_drive = optimize_stops_mult(home, stops, num_travelers = 2, stop_labels = stop_labels, costing = 'auto') print opt_2_people_drive['time']
points = [[opt_2_people_drive['home_point']]] + opt_2_people_drive['points'] labels = [['Home']]+[[' - '.join((str(i), x)) for i, x in enumerate(l, 1)] for l in opt_2_people_drive['order']] m3 = plot_stops(points, zoom_level = 14, tiles = 'Stamen Watercolor', labels = labels, line_coords = opt_2_people_drive['lines']) m3
Now let’s say that Celeste and I have a friend who’s willing to help us with our errands. I run the optimization function again, now specifying 3 travelers. The results from this indicated that one person should go to Pizzetta, one person should go to Toy Boat and Noah’s Bagels, and one person should go to Lamp’s Plus, the de Young, and Arguello Market. In this case, the person with the fewest amount of stops has the longest travel time (52 minutes), and the full set of errands will be made 15 minutes faster than it was with only 2 people.
opt_3_people = optimize_stops_mult(home, stops, 3, stop_labels = stop_labels)
for i in opt_3_people['order']: print i
["Noah's Bagels"] ['Lamps Plus', 'Pizzetta'] ['de Young Museum', 'Arguello Market', 'Toy Boat']
[44, 53, 48]
points = [[opt_3_people['home_point']]] + opt_3_people['points'] labels = [['Home']] + [[' - '.join((str(i), x)) for i, x in enumerate(l, 1)] for l in opt_3_people['order']] m4 = plot_stops(points, zoom_level = 14, tiles = 'Stamen Watercolor', labels = labels,line_coords = opt_3_people['lines']) m4
I found working with these Mapzen services in Python to be really interesting and enjoyable. As I mentioned, this exercise was done mainly as a way to become familiar with some of their routing and geocoding tools, as well as to develop a way to map the geospatial outputs from these tools. My “extension” to the service was done as a quick and dirty way to obtain and visualize an answer for the vehicle routing problem that works for a small number of stops. A next step would be to look into the optimization algorithm that Mapzen uses on the time-distance matrix, so that it can be adapted to work more generally for multiple salesman. This would address the problem further upstream and require many fewer calls to the API.