Evolution of the traveling salesperson

The traveling salesperson problem is a relatively famous math/geometry/computer science problem. The version I’ll be talking about in this post is where the salesperson has to visit several points on a map, minimizing the travel distance. Repeating a point on the map isn’t allowed, though I’m pretty sure that would increase the distance anyways. I got thinking about this problem thanks to this great post showing how to solve this problem using simulated annealing.

My approach to solving this problems involves my go-to tool for lots of hard problems: a genetic algorithm. Basically I try a random set of solutions, rank them based on the distance each takes, and I randomly choose them (with weights determined from their “fitness” or distance) to make babies by mutating them (switching two points in the visit order). I make lots of babies and then choose the best ones to populate the next generation. This tool is quite versatile as you just have to define a fitness function that returns a rankable value and you have to carefully determine what the genome structure is and what mutation is. Then you’re off and running. Note that I didn’t do any sexual reproduction as I didn’t have any good ideas for how to do that in this situation (though my friend Jon Gaffney did send me some great resources for that).

I also decided to throw a wrench in. For fun I had the locations on the map slowly move around, using basically a random walk for each. I did that so that I can show this to my First Year Seminar students to talk about how evolution can handle a changing landscape (their summer reading book was Paleofantasy and their assignment was to find a connection between that book and engineering – most are either agreeing or disagreeing with the notion that evolution is an example of engineering).

Here’s a gif showing the best candidate at the end of each generation (one frame for each generation).

Each frame shows the best candidate after each generation. The number is the current length of the path

Each frame shows the best candidate after each generation. The number is the current length of the path

And here’s a plot showing the fitnesses of every candidate in every generation:

fitness (path distance) versus generation. For each generation, all the distances are shown as blue dots

fitness (path distance) versus generation. For each generation, all the distances are shown as blue dots

As you can see, it converges rather nicely, thought the spread in candidates never goes to zero because the dots are constantly moving. To show that it can handle an abrupt change to the environment, here’s a gif showing a situation where the points are randomly shifted to a new location halfway through:

The points randomly shift halfway through

The points randomly shift halfway through

and here’s the plot of the fitness versus generation (note the spike when the points move):

the points shifted halfway through

the points shifted halfway through

Finally, I thought I’d have some fun with river crossings. I decided to see what would happen if I put in some rivers (line segments) and charged a huge cost (10 meters) for crossing them. Here’s an example:

this time there's a huge cost (10 meters) to cross any blue line segments

this time there’s a huge cost (10 meters) to cross any blue line segments

Here the plot of fitness versus generation really just shows when extra crossings get added:

fitness versus generation for the river crossing version

fitness versus generation for the river crossing version

Fun huh?

So, how can we quantify the value of this approach. Here’s on way: compare to a blind hunt. I could just randomly create paths (really just randomizing a list of integers from 1 to the number of points) and constantly check their fitness. Here’s what a histogram looks like for a set of points with a “best path” fitness around 9:

histogram of 10,000 random guesses for a set of points with a "best distance" of roughly 9.

histogram of 10,000 random guesses for a set of points with a “best distance” of roughly 9.

Note how gaussian it looks. It has a mean of 22 and a standard deviation of about 2. That means that the “best” is more than six standard deviations away from the mean. What are the chances of getting that randomly (assuming the normal distribution holds, which is surely doesn’t but whatever)? That’s just a relatively straightforward statistics calculation that leads to an answer of 37 billion to one against. Another way to say it is that you’d have to run billions of random tests to have a decent chance of getting a “close-to-best” path.

Now, that last paragraph suffers from a huge assumption about the distribution staying normal (or gaussian), so here’s another way to think of it. My genetic algorithm runs 200 generations with 50 kids per generation. That’s 10000 different tests. What percentage of all the possible paths is that? Well that’s easy: 10000 divided by 20 factorial (because I’m currently using 20 points). That gives a value of 0.000000000004% (also known as basically zero). So I think the genetic algorithm is doing a very efficient job for this problem.

Thoughts? Here’s some starters for you:

  1. I’m in your fysem (first year seminar) and I think this is really cool. I’m going to change my paper to . . .
  2. I’m in your fysem and this is dumb. Just another example of why I should have dropped this class.
  3. How did you do this in Mathematica? Can you share your code?
  4. If your friend Jon has already done this, why are you trying to steal his thunder?
  5. If another friend has done a ton of work on this too, why haven’t you at least linked to his tweets?
  6. Your assumption that the distribution stays gaussian is stupid. Instead you should . . .
  7. How do you know you’re not in a local minimum? I can see in a few of the gifs that there are still some crossed paths and that means you’re not at a minimum.
  8. You speak of evolution as if were more than a theory, why do you do that?
  9. You’re such a fan of brute force approaches, why don’t you just do all 20! versions?
  10. Why didn’t you have the rivers moving too? Or have curved rivers? Or wide rivers?
  11. Why a cost of 10 meters for a crossing? That’s just dumb. Instead I would . . .
  12. In the traveling salesperson problem the visit points should not be moving. This is an abomination.
Advertisements

About Andy "SuperFly" Rundquist

Professor of physics at Hamline University in St. Paul, MN
This entry was posted in fun, math, mathematica. Bookmark the permalink.

11 Responses to Evolution of the traveling salesperson

  1. You can lift the starting point ambiguity by always start at the Nth point [now it’s (N-1)! permutations to try] and then remove the directional ambiguity an always start off going neighboring vertex with the smaller index number [now I’m down to 0.5*(N-1)! permutations to try] but that is still way too many for my laptop if N>9 (>100,000 permutations). In summary, genetic algorithms for the win!

    PS
    Thanks for giving me something cool to toy with over the weekend and for the tweet links.

  2. Pablo says:

    How did you do this in Mathematica? Can you share your code?

    • Andy "SuperFly" Rundquist says:

      There’s a link in the comment starters. Let me know if that doesn’t work

      On Tue, Nov 11, 2014 at 5:51 PM, SuperFly Physics wrote:

      >

  3. Do you have any idea about what is the reason Mathematica takes so long in doing simple calculations. I mean, in the post you mentioned first, they do 40000 iterations for the salesman problem in seconds, while Mathematica takes over 3 hours in do the same. I use ParallelTable, float numbers and compile functions but the time difference is simply absurd.

    Any ideas?

    • Andy "SuperFly" Rundquist says:

      It sounds like you’re doing the right things. I have at times found it to be slow for these kinds of things, but mathematica.stackexchange.com usually has some good tips for me. So you coded the simulated annealing part? I guess I’m not sure how fast that should run. Oddly, I would think that 40000 iterations of simulated annealing should be akin to mine doing 800 generations (800*50 individuals per generation) and that doesn’t really take that long (certainly not 3 hours).

      On Thu, Nov 13, 2014 at 11:14 AM, SuperFly Physics wrote:

      >

      • Yah, that was my guess…but the reality is simply ridiculous. I’m running on a i790 with 12 Gb of DDR3 and a NVIDIA Fermi….so is really absurd the timing. If you have some time (sorry for the bothering) here is my code adapted to the names you use in your own code (so the reading would be easier):

        https://www.dropbox.com/s/3rz8lvdnnyu8kmd/Salesman%20%28ane%29.nb?dl=0

        This code is NOT parallel, neither compile (to discard these things from the mix).

        I’m really sorry for the “please read this code” thing, but this is the n-repetition of the same problem in Mathematica (slow calculations) and i really want to know what do you think.

        Thank you very much!

      • Andy "SuperFly" Rundquist says:

        I’m running v9 and got an error on how you’re using RandomSample. Do you get that (running your first cell)?

        On Thu, Nov 13, 2014 at 12:01 PM, SuperFly Physics wrote:

        >

      • pablogsalgado says:

        Nop, I’m with v10, maybe that mathers.

      • pablogsalgado says:

        Also notice the cool progress bar using “Monitor” feature.

        🙂

  4. pablogsalgado says:

    But this should not be a problem since RandomSample[] only mixes the previous list of points (that were generated using CountryData[]. Maybe this is a v10 feature, as GeoDistance and GeoGraphics.

    😦

    • pablogsalgado says:

      Yep, I’m pretty sure that’s the problem since Geo_Stuff were introduced in v10 and in this RandomSample i’m using _GeoPosition as patternmatching. As i’m also using GeoDistance and GeoGraphics, i’m afraid this will not work in v9.

      I’m so sorry. 😦

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s