Route Optimization – Improving Logistics and Operations
This year we had a long-standing client reach out to us about an exciting challenge. They told us that they were aiming to improve upon their existing travel logistics processes and that they wanted to branch out into the world of route optimization. They have a large fleet of service technicians who drive around every day, visiting and servicing many different locations for their customers. The creation of each technician’s daily schedule was a very manual process, where a person would sit down and assign service locations to every technician’s schedule at the beginning of the month. This process relied heavily on the scheduler’s geographic knowledge, was prone to human error, and required a lot of effort only to yield suboptimal results.
Needless to say, we were eager to dive into problem-solving mode to better understand our client’s needs and figure out what we could offer them as potential solutions.
At fjorge, we never want to reinvent the wheel. The best way to ensure this is by understanding our client’s needs inside and out, and determining if there are any readily available out-of-the-box solutions that may be viable candidates. If none exist, then it’s straight to the drawing board! The challenge that our client was facing was hardly a unique problem – there are entire industries built around improving logistics and travel efficiency. This made finding existing services very easy; we found solutions like GraphHopper, OptaPlanner, OptimoRoute, and Routific, all of which looked fairly performant and reasonably priced.
But would these services accommodate our client’s business needs? In addition to the obvious goal of maximizing the number of customers that their technicians’ service on a daily basis, our client wanted their minimum viable product to be able to
- Specify the starting and ending locations of each technician for each day.
- Specify the starting and ending time of each technician’s workday for each day.
- Specify the time window in which each service location would be available.
- Specify the estimated service time required for each service location.
There was also the consideration that our client needed this service to compete with other service management applications in the market. In order to retain a competitive advantage long-term, they needed a highly flexible and customizable solution that could scale well and would cost minimal time and budget to modify as market opportunities arise or forces shift.
We had a creeping suspicion that we might need to look into the feasibility of a more custom solution. A rather terrifying idea, to say the least. How do we expect to break into a space as highly-specialized in the tech world as route optimization, where even titans like Amazon struggle to maximize efficiency? So we hit the books and began to familiarize ourselves with the actual problem that we were facing, which in computer science is widely known as the vehicle routing problem (VRP).
Vehicle Routing Problem
The VRP is a generalization of one of the most classic problems in combinatorial optimization, theoretical computer science, and operations research – the traveling salesperson problem (TSP). The TSP represents the computational problem of creating the best route for someone (a salesperson) who needs to visit a given set of locations, spending as little time as possible in transit. From the TSP, the VRP can most succinctly be described as the problem faced in creating an optimal set of routes for vehicles attempting to visit a set of locations. The goal is to arrive at all locations in such a manner that minimizes the overall cost of routes while satisfying any additional constraints.
Figure 1. Locations in a Vehicle Routing Problem.
Figure 2. Solution for a Vehicle Routing Problem with four vehicles.
For decades, the problem itself has been proven to be of non-deterministic polynomial-time hardness in regards to its computational complexity, meaning that finding the most optimal solution would be an extremely long process, as there would be (n-1)! permutations, where n is the number of locations. In other words, finding the single most optimal solution requires a brute-force search of every possible solution. To do this for 3 locations involves comparing 6 routes to find the one with the shortest path; to do this for 500 locations involves comparing 1.22×10^1134 routes. To put that into perspective, there are only 1×10^21 stars in the observable universe. Even for a computer, that’s too much math. So, instead of trying to determine the most optimal solution, researchers frequently choose to apply metaheuristics when searching for solutions. At this point, we were all properly terrified, and wondering how the heck we were supposed to model our problem and apply these metaheuristics to find near-optimal solutions for our routes.
If you were to go online and ask around about how to begin solving your own route optimization problems, you’d be hard-pressed to not find any mention of Google’s Operations Research tools (OR-Tools) OR-Tools is an open-source software library designed for optimization – specifically equipped for solving a wide variety of vehicle routing problems. We spent time combing through the documentation that Google’s developer team had provided, and we started to toy around with modeling sample routing problems. The developers of the library had already coded a handful of algorithms that we could use, saving us many months worth of time had we needed to do it ourselves. With some help from the open-source community, we were able to apply all of the business constraints that our client had requested (variable number of vehicles, time windows, required locations, etc). All we would have to do now was process some sample locations for route optimization!
In solving a vehicle routing problem, the task of preparing the location data itself poses a non-trivial obstacle. In Google OR-Tools, locations are expressed as distance matrices; these are essentially two-dimensional arrays that represent the cost required for a vehicle to travel between any combination of two locations. This cost could be a function of time, measured in seconds or minutes, or it could be a function of distance, measured in miles or kilometers. If you have a list of 100 locations, the size of the matrix would be 100 x 100.
Figure 3. Distance matrix for the 16 locations shown in Figure 1.
We had a list of locations, but we had no way to determine cost of transit between each of these locations; there are an incredible number of factors to account for when estimating these transit costs, such as different speed limits on different roads, traffic at different times of the day, and the availability of roads due to construction. We knew that Google offered a Distance Matrix API web service, but the pricing scheme was such that our bill would grow exponentially with the number of locations we sent to the web service. However, it was reliable and served its purpose well, so we stuck with it through the majority of development, all while knowing that the simple slip of an API call with too many locations would mean that we could expect to see a bill easily within the ballpark of several thousand dollars. Would Google be forgiving if we begged and explained our mistake? Probably! But we were careful to not find out the answer. We explored other similar options with HERE Technologies and GraphHopper – the story was more or less the same. We were stuck trying to figure out how we could optimize routes for large lists of locations while not incurring astronomic costs… until we got lucky with a Google search!
The Open Source Routing Machine (OSRM) is an open-source, highly performant routing engine written in C++ that is designed for use with data from the OpenStreetMap project. It has a table service that is accessible via an HTTP API which computes the duration or distances of the fastest route between all locations that are supplied – a distance matrix! This would provide us a way to calculate all of our distance matrices at a comically cheaper rate, with the only caveat being that we wouldn’t have up-to-date traffic or construction data reflected in our results; this was a compromise we were happy to make. So we provisioned an EC2 instance from Amazon Web Services (AWS), set up the OSRM backend docker container, and loaded the most recently derived data from the OpenStreetMap dataset that was available.
With that piece of the puzzle in place, it was finally time to test our solution. We had to whip up some real-world data to feed our route optimization service, so we built a list of locations for 60 McDonald’s restaurants in the metropolitan area surrounding Minneapolis and began preparing for scheduling!
For the sake of creating optimized routes that would be well-represented on a map, we minimized the complexity of constraints in our model. We modeled the problem for just one single day, where three vehicles would all start at the same time from the same location (our downtown office), and end the day at the same time, back at the original starting location. We set the time window availability for each McDonald’s location to be available all day long, and required the service time to be 15 minutes at each location, and specified that no locations were to be favored or excluded from any vehicles. Just to spice things up a little, we scheduled each vehicle to have a 30-minute lunch break at a Costco in St. Louis Park at the same time.
Figure 4. Map of McDonald’s locations in the Greater Minneapolis area.
Figure 5. Map of McDonald’s locations in the Greater Minneapolis area with routes.
We clicked run, waited a few moments, and… Voilà! We had a solution. We did a little data transformation and imported the results to Google Maps to see how each of the three routes compared; since we were focusing on optimizing the time that the vehicles spent in the field, it was quickly apparent that solution had decided that the first two vehicles would be assigned many locations that were close to each other, while the third vehicle focused on working through much fewer locations that were significantly more distanced. And when you think about it, this makes a lot of sense!
With the finishing touches in place, it was time to get the web service into the hands of our client so they could begin using it. For the bulk of development, we had planned on deploying to AWS by leveraging the API Gateway and Lambda services, while relying on our OSRM backend to be constantly running on an EC2 instance. Serverless architecture is all the craze, afterall. We set it up, and it worked beautifully… until we started sending much larger requests with more locations. More locations in routes meant more time required for OSRM to create distance matrices and more time required for OR-Tools to find favorable routes. We were soon receiving timeout errors from API Gateway, where the maximum timeout limit is 30 seconds.
We realized this architecture was no longer going to suit our needs, so we spun up a quick lightweight EC2 instance to run our web service using Express.js in Node; this way we could configure the timeout to be whatever we wanted. We high-fived, took a step back and examined what we had running. There was a small EC2 instance, a large EC2 instance, and a Lambda function. The small EC2 instance would run our web service, The large EC2 instance would run OSRM so that we could create distance matrices, and the Lambda function would receive all that data and use OR-Tools to model each vehicle routing problem and create a near-optimal solution.
This was one of the most complex systems we’ve built at fjorge. There were a lot of components we ended up including that we weren’t anticipating. In the end, this project was completed under budget by a handful of hours of development time. At fjorge, part of our process involves reflecting on work we’ve completed to see what we learned. Here’s what we came up with:
- Plans are useless, but planning is everything. – Dwight D. Eisenhower
Did the product contain the components we researched and recommended during planning? 100%, it did not. But because we spent so much time in preparation and digging in to understand the client’s needs and expectations, we could pivot more fluidly during development because we were deeply aware of the domain of the product.
- Prototyping works. We knew this was a complicated problem. If you’ve ever written something, you know that nothing cures the writer’s block for a month-long timeline like a deadline of “tomorrow.” To get ourselves into problem-solving mode, we started out the build with a little prototyping project, which gave our team the space to learn what was available, try out a lot of options, and see which tools were going to be easier or harder to work with. This approach rewarded us and our client with an on-time, on-budget delivery
- We are part of a community. It’s impossible to overstate the degree to which people in the open-source community got us out of a pickle. The number of brilliant people in the developer community that are willing to offer their time to help others is inspiring, and for that we are truly thankful.