Brachistochrone: The shape of the fastest slide between two points.
The Brachistochrone problem is one of the first and most important examples of the calculus of variations. It’s nearly required in any Theoretical or Classical Mechanics class for physics majors. It’s a great physics problem, and possibly an even greater math problem. This post is a placeholder for some notes on how I’ve learned to teach it.
As with most things related to my teaching, I put my energy on things that made me crabby last time around. The first time I taught this course, I hearkened back to my own undergraduate and graduate training in the calculus of variations and rubbed my hands together. I fondly remembered how cool it was to solve mechanics problems using the Euler-Lagrange equation () and I was excited to get to the point when my students could get to work.
Then it dawned on me that I had never internalized the calculus of variations that was behind the E-L equation. So, like I’ve done so many other times (regretfully), I turned to the text about two weeks in advance of needing to teach it, and I “re”-learned it.
I was mostly fine with the notion of finding curves that minimize path integrals, but I found myself frustrated with the examples and the “out-of-nowhere” special tricks. Here’s a couple examples:
- An example in the book considers the problem of finding a curve between two points that minimizes the path integral of the square of the slope of the curve. What it does is assume that a straight line accomplishes the minimization, and works out how a particular deviation from straight definitely increases the integral. I have a problem with that because, 1, you aren’t shown why the straight line minimizes the problem, and, 2, you’ve only shown that the particular deviation investigated doesn’t work. What we need to do for the global problem is investigate all possible lines, find a candidate, and then investigate all possible nearby deviations. This example only does a tiny subset of this, but it doesn’t make that point clear.
- The typical treatment of the brachistochrone problem involves first an odd choice for variables (y as horizontal and x as vertical) and then a fancy variable substitution that makes the differential equation “look familiar” leading somewhat directly to the result of the cycloid curve. This is one of those moments when I’m reminded that we often introduce students to concepts that were first tackled by people with much stronger math skills. The whole “look familiar” part of this just isn’t true for most of my students.
But, since this was the old days where I did homework and tests, I figured I probably wouldn’t assess their understanding of the derivation, but rather their ability to use the equation. So, basically, I handed it to them on a silver platter.
Since then I’ve taught this course a couple more times, and I think I’ve found some better ways to teach this.
First, I have the students engage enough to understand the problem that the Euler equation is solving. Take this weekend, for example. I asked my students to think about two things in advance of Monday’s class that’s introducing the calculus of variations. First: How do you prove that the shortest distance between two points is a straight line? Second: What would be the shape of the fastest slide between two points? I expect quite a bit from them on the first question, and I’m hoping some of that rubs off to help them at least be prepared to attack the second problem. Note that, while I usually run a “flipped class,” for this time around I didn’t want to recommend any reading or screencasts that might lead too quickly to the Euler approach. I want them to think about these two problems with whatever tools they already have.
Next, in class, I have them engage with a Mathematica notebook I’ve built that lets them shape slides between two fixed points. Each student comes up one at a time and “draws” their slide. It’s really a matter of adjusting an arbitrary (and controllable) number of control points for an interpolation algorithm, but all the while it displays the time for the journey of the bead so they can do some quick minimization on their own. When one is satisfied, she’ll then save the result to a growing list. When they’re all saved, I run the code that runs the race. It plots all their slides on top of each other and has a bead on each run its course. They’re all usually pretty good curves, thanks to the minimization they’ve already done, but they’re definitely distinguishable, usually due to the initial thoughts they had before starting to do some fine tuning. Certainly they all beat a straight slide, though I usually enter that one in for my own “guess” so that they can see that.
Then, this time at least, we’re off and running talking about the shortest distance between two points. I suppose I might do that first before the simulation above, but for the moment I like the simulation right away to remind them of the work they were supposed to do over the weekend. We’ll see where I land Monday morning.
One of my goals in these early stages is to impress upon the students the difficulty of the task at hand. How many curves are there between the two points? How can we find the needle in the haystack that minimizes the function we care about (length in the first example, time in the second)? If someone suggests a candidate “best path,” how do we check that? Are we sure some minute deviation somewhere won’t improve it? How do we check all the deviations? If I can get them suitably depressed about all of this, I feel like they might be willing to join me down the rabbit hole to the Euler derivation.
My approach to the derivation (skip to next section if you want)
Ok, we’ve got a functional (fancy name for a function) that involves the height of a curve above the axis, and the slope of that curve at every point. As we walk from x1 to x2, we stop at every point, note how high the curve is, and note it’s slope, and do whatever math we care about for this journey on those two numbers. We then add that result to all the previous ones until we get to x2. That’ll be our integral, let’s call it J just for the heck of it. What that amounts to is this:
Now, how do we try to think about the potential best path? What we’re doing here is trying to see if we can learn what governs its shape. Maybe we can determine more about it than what it integrates to. Knowing the definite integral result of a function is often hard to reverse. Consider knowing that was, say, 3.42. What could you figure out about f(x) except broad strokes? Nothing, it would seem. Ok then. I guess we’re hoping to learn a little more about the best curve by plowing ahead. Buckle up.
I said before that we know we’ve found the best path if we can show that any deviations from that path lead to a worse result. Now, right away we should point out that there are two deviations that aren’t allowed. The path has to start at (x1, y1) and end at (x2, y2). We can’t give on that, because we need to know the boundaries. But any other deviation’ll be fine. Ok, here’s a suggestion for how we study arbitrary deviations:
Here were claiming y(0,x) is the best path (the thing we’re hoping we’ll learn something about). Then we’re claiming that the arbitrary function represents all kinds of shapes of deviations. Then represents how big that deviation might be.
Let’s be careful here. It might at first seem that having and is overkill. Since is arbitrary, larger versions of a particularly shaped deviation could just be considered another . So why do we need to allow for a deviation shape that can grow? Well, honestly, because for any deviation shape that we can dream up, we need to show that having the path start to deviate in that direction from the best path causes a worsening of the integral. We can do that by showing that a derivative of the whole (definite) integral with respect to is zero. That’s how we always show that we’re at the minimum of a function, right? Now, be careful here, because, for the moment, we have the insane problem of showing that derivative is zero for all the different shapes (‘s) we can think of. The key to the derivation below is to find a way to get to cancel, because then we’ll have a result that’ll work for all of them in one stroke!
So now J is a function of :
where the prime indicates a derivative with respect to x. Note that
So, can we take ? Sure, why not:
Here I’ve used the chain rule, recognizing that f is a function of both y(x) and y’(x). Now we can realize that both y and y’ are simple functions of :
The problem now is that both and are present, meaning we can’t really cancel either one. What we want is to factor out so that, since the derivative needs to be zero to indicate a minimum, we can set whatever’s left to zero and make the whole thing zero for all ‘s at once. To do that, we use integration by parts. Ah yes, that old tried-and-true inverse of the product rule that does one thing and does it well: it reduces the level of derivative on a portion of an integrand at the cost of an additional derivative on a different part. In this case, we’ll set u to and dv to . We also have to be careful with du:
That leads to:
At first it seems we’ve isolated at the cost of a whole new nasty term. But wait! ! (not zero factorial, that would be one). So that gets rid of the weird term and we’re left with an integral that has factored out. What that means is that if we can set the stuff on the braces equal to zero, then dJ/d will be zero for every deviation we can think of all at once.
Ok, so what does that get us? Well, remember that f is a collection of ys and y’s that we care about. The derivatives involved in the braces above are basically asking f how it depends on both y and y’. We should know that, since we fashioned the problem in the first place. But, and here’s one of the cool things about this, we’ve constructed this derivation such that the braces are zero only for the best solution. So the differential equation we’ll get will be true only for the best solution! So, we do what the braces tell us, and then solve that differential equation for the solution we’ve wanted to get all along. Awesome.
Some trivial examples
Ok, hopefully this’ll start to make sense as we apply it to some situations. Consider this simple question: Go from (x1, y1) to (x2, y2) such that the integral of y(x) along the path is minimized. (Pause and think about what that might look like.) What we’re asking about is:
Ok, what does the Euler equation tell us? Well, obviously f=y(x). So the Euler equation is:
Hmm, so the solution to 1=0 is the curve we’re looking for. That’s weird. But (and here I hope you paused earlier), it seems that there can’t be a solution to this problem. I can just draw a curve that drops precipitously straight down as far as I want on both end points. There’s no limit to that, and that’s why the equation is nonsensical.
Ok, what about this: Go from (x1, y1) to (x2, y2) and calculate the path integral of y’(x). Find the curve that minimizes that. Ok, this time f=y’(x) and the Euler equation gives us
Hmmm. It seems to be true for all curves! Think about that, and you’ll likely realize that we just proved the fundamental theorem of calculus!
Ok, now let’s consider the original problem of this post. How do we determine the time it takes something to go down the slide? Well, like I often tell my students, jump to the middle, figure out the physics of a single step, and integrate to figure out the big picture. So, let’s assume you’re at some x between x1 and x2. The slide’s height at that point is y(x). How long will it take you to go to x+dx and y(x+dx)? Well, we need to know how far that is, and how fast we’re going. For how far, we have
where for the last equality I used dy/dx is the definition of y’(x). As far as the speed, we get that from the kinetic energy which should be the amount of potential energy (PE) we’ve cashed in so far:
So, the time it takes to take that next step would be distance/speed:
Great! Now we can get the total time . But wait! That’s the exact form for using the Euler approach, with f equal to dt. It’s a function of y and y’ so we should be in business.
Before I talk about where that leads, I want to complain again about one of the approaches I’ve seen about this. The text I most recently used immediately suggests switching y and x. That way the denominator no longer has y or y’ in it, to make the derivatives easier. I’m put off by that, because I want my students to feel that the straightforward approach will always work for them, even if the result looks daunting. The “switch y and x” trick here is something that students might think is a magical wand to help you out. Using my favorite tool (NDSolve in Mathematica) you don’t need to do this trick, and x stays horizontal and y stays vertical.
The other complaint I have about the treatment I’ve seen is that, after switching y and x, the resulting differential equation “looks familiar” if you switch to a different parametrization. Oh, look! It’s a cycloid! Ugh.
What Mathematica taught me
So, what happens if you plug the f above (for the Brachistochrone problem) into the Euler equation and send the result to NDSolve in Mathematica? Well, lots of cool things, but it took me a while to get a handle on a couple of issues.
First, Mathematica got mad at me about initial conditions. I gave it a second order differential equation and told it to start at (x1, y1). It said “not enough equations to match the order or of the differential equation” or something like that. Hmm. I guess I could give it an initial slope as well (ie, give it both y(x1) and y’(x1)). That did the trick, but it made me feel that I was telling it information that I didn’t think I had.
Second, it doesn’t like that denominator very much. It equals zero right at (x1, y1), so I nudged the initial conditions off it just a little. That’s a trick I do a lot, and I keep waiting for a math professor to take my head off for it. Oh well, it works for me.
Ok, after those tweaks, I got a curve! But it didn’t go through (x2, y2). Shoot. You know why? Because my guess for the slope was wrong. If you adjust that, you can keep tweaking until you go through the point of interest. So, that’s what you have to do. What’s cool is that any time you get a curve, you can trust that it represents the fastest slide that hits those points. Useful, sort of.
So, that’s what I have so far. I’ll post this tonight and try to either update it or write a new posts later after teaching this stuff this week. Thanks for hanging in. Please let me know in the comments below what you think about all of this. Here are some stock questions you should feel free to use:
- I think the “switch y and x” trick is simple, why are you scared of it?
- I think students need to know that it’s a cycloid. Why are you avoiding that?
- Why do you have Mathematica do all the work for you?
- Why don’t you do this in Python instead (ok, that’s really for two of my loyal readers).
- What happens when your students read this?