Cold live coding in class

I tried an experiment yesterday in my Introduction to Computational Data Science course. We have been working on doing analysis of Kaggle data sets, with each student having picked what they want to work on that will also lead to some web page scraping later in the semester. They have to analyze the data in multiple ways and eventually tell a story with it. We’ve been learning lots of python and pandas tricks to do all this work, but I wanted to help them deal with the sorts of questions they constantly have to think about. So I decided that together we’d all tackle a brand new (to us) data set in class and see how far we could get. I didn’t know what to expect, but I knew we’d hit a bunch of tough points and I planned to have some meta conversations about them when we did. It went pretty well and I thought I’d get some notes down here about it.

We started with each small group (which, of course, I set up using my “Would you rather” approach) proposing a search term we could use on Kaggle. Then they voted on the proposed teams and I typed the winner (“wealth”) into the search box. Then we voted on which search return we should use. (Note that I’m not going to give the details about which data set we used. See below for why.)

Next we loaded the data set into python (using Google’s Colab, if you must know) and started looking at it. A quick shout-out to the pandas library: the data had 1,000,000 rows and we were able to work with it with no problems at all (imagine opening it in google sheets or that upstart other spreadsheet software – I think it’s called excel or something like that).

We hit our first snag when we were trying to figure out what types of things were in a particular column. Students suggested just printing that column but they saw that we’d still have to scroll through a million rows to really see what’s in there. They asked me to run the unique() function on that column but that still had 1000 items in it (still tough to scroll through and get a good sense of what’s going on). We settled on value_counts() to see the most popular items in that column, but then we hit another snag.

We couldn’t tell if the unique values in that column told the whole story. We were trying to see, for example, if a single row might have two things in that column, kind of like “car, truck” when describing accidents. Does car show up and then later truck in their own rows or if an accident involves both are they together in one row. Looking at the unique elements we saw “car” but we couldn’t be sure that “car, truck” might also be somewhere in that 1000 items. That’s when a student said “just use df.column.str.contains(“.*car.*”) and we’ll be in business!” Excellent, just what we’d learned in the last two weeks – a combo of pandas and regular expression jujitsu! But, alas, it didn’t work.

You see, I knew as soon as I saw the result of value_counts() that we were in trouble. I know all you pandas ninjas out there are laughing at me right now, but neither I nor my students knew how to filter the index values of a pandas series. Every suggestion they had got slapped down because it would only work on regular columns, not index columns.

I’m super happy to report that I wasn’t faking it. I literally didn’t know how to do it. However, and this I reported to the class, I knew it could be done. That’s one of my learning outcomes: I want my students to have confidence in what’s possible, even if they don’t know how to do it. So I asked them what they wanted to do. Did they want to google how to do it in pandas or follow another student’s suggestion and just do it straight in python using a loop. We had a great conversation about what they might put into a google search to help out and it was clear that they’d always add “pandas” to the search terms. We tried a few but then I had a brainstorm. I said “This will seem stupid and overkill, but I know it will work. Watch this.” And then I typed pd.series(df.column.value_counts().index.values).str.contains(“.*car.*”). Yep, I took the unique results from a pandas data frame and recast it as a new and different pandas series just so that the index column would be a normal column. Super overkill. But it worked. The students groaned and said it was likely a really dumb way to do it.

So we stopped to talk about it. And I think it was my favorite part of the day.

I said “come on, it works, who cares?” Some responded saying that it can’t possibly be the elegant solution that surely exists. They talked about how they hate it when they do something dumb like this and later learn a much better way. For me this is one of the key things about algorithmic thinking. Helping students see and discuss issues like this are what I love. Yep it was overkill. Yep it’s not elegant. But it works. Is that the end of the story? It doesn’t seem like my students think so.

By the way, after all that we learned that nothing like “car, truck” exists so we were in business! Next we wanted to get a visual on the data for the most popular item in that column. I’ll call it “cars” for now. Basically we wanted to know how another numeric column behaves for the rows about “cars”. We decided on a histogram and we were surprised by the result. Essentially it only had one bar way on the left and then a bunch of empty space all the way to the right. I reminded them that if it showed that much empty space it must be that there were some really small bars that just were hard to see. What the heck? That’s when I showed them that while df.column.hist() and plt.hist(df.column) both show the same graph, the latter also prints the raw data for the histogram bins. That’s super useful when you’re trying to see what’s going on with weird data. Sure enough the first bar had a count of 60,000 and the next 8 bars had counts of zero and the last bar had a count of 9 (I’d say 9! but that’s actually much bigger than 60,000).

Looking at the value of the bins one student shouted “typo!” meaning that those 9 must be due to data entry problems. They had good reason to say that (sorry, still not going to give away the details, see below). We did some quick calculations to see if there could possibly be 9 counts that far away from the rest of the data and we’re pretty convinced that they’re typos.

But now time was running out and we wanted to see much more detail from the 60,000. I said we could try to get rid of the 9 or just zoom in on the graph. It was interesting to see that no one had immediate ideas for doing either of those, though I’m sure they could see how to filter out the 9. Instead I just ran the histograms with 100 bins instead of 10 and that first bar split up a little. I again told them it was a dumb move but at least we knew there was some cool structure to the data.

Since time had effectively run out, I gave them a choice. I could either do the usual and go back to my office to make a screencast that finished the work, and give them all the proper syntax to use. Or I could do nothing and we could pick it back up on Monday, including the meta conversations. They really liked that, so I’m going for it.

Because of that choice, I am forcing myself to not dig into the data set. I know they want to eventually be able to put the data into a really cool visual that I don’t know how to do, but I’m making sure I don’t cheat and look all that up right now. It’s also why I’m not telling you, dear reader, the details of what we’re up to. If I did I’m afraid one of you would tell us all how to do what we want to do. But that would take the fun out of it!

Your thoughts

I’d love to hear your thoughts about this. Here are some starters for you:

  • I’m in this class and I really think the meta conversations helped me a lot. In particular . . .
  • I’m in this class and you keep describing our boring work as “interesting discussions.” Please stop.
  • I stopped reading when you said you weren’t going to give any details. This is just clickbait.
  • What do you have against Excel?
  • I think you should have just scrolled through the million rows. Surely their eyes would catch all the cool patterns right away!
  • I like this live coding. Were you worried that it would go off the rails?
  • I think if I did live coding I’d do a lot of practicing first. Did you do that?
  • What is your deal with that dumb factorial joke?
  • Do the students know when you shift to meta discussion? Is there a signal or are you explicit about it?
  • I think this was possibly a cool class but maybe their vote for more indicates their enjoyment instead of their learning. Are those decoupled in your class?
  • I’m in this class and my enjoyment and learning are the same!
  • I’m in this class and my enjoyment and learning are completely uncorrelated.
  • I can’t believe you think I’d drop everything and just do your work for you.
  • I can’t believe you’re not giving me the details. I want to do all your work for you.
Posted in programming, teaching | Tagged , | 2 Comments

Computational Data Science early semester thoughts

I’m back in the classroom! At least for a semester anyways. In the dean’s office I teach one class per year and last year was a fully online course, so this is a fun adventure (so far at least). This is just a post capturing some of the things that have happened that have piqued my interest. Here’s a quick (linked) list:

A quick description of the class: This is Introduction to Computational Data Science, a course that comes at the beginning of our new CDS major/minor program (though our python programming course is a prerequisite). The entire program aims to help students find and gather data, analyze it, and tell a story or make a decision with it. This course has them take their python skills and focus them on data. After this class students should be able to use tools like pandas, web scraping, and APIs to collect, analyze, and tell stories about data.

Grouping students with “Would you rather . . .” questions

I’ve used this before, but I’d forgotten how fun it can be. Nearly daily I’ll put students into work groups for things like brainstorming ideas for analyzing a particular Kaggle data set. There’s only 16 students in the class, but I know from experience that they don’t always get to know each other even in such a small class. Even though I’m using Canvas as the LMS for this course, I’ve decided to load up my old Rundquist-hates-blackboard-so-he-wrote-his-own-LMS LMS with the class roster so I could use my old group randomizer.

It shows all the students with checkboxes next to them. I check who’s present, then indicate the max group size I’m interested in. It randomizes them and ensures that no group has more than the max and no group has less than the max minus one. But the part that’s fun is that it also displays a randomized “Would you rather . . .” question from this site. I encourage the students to find their group, introduce themselves, and then answer the wyr question. Then I tell them what the group is supposed to do for the class.

I find it to be an easy way to build community, and it seems to be working pretty well.

Major projects (Twitter mining, web scraping, Digital footprint)

We have three major projects for this course.

Twitter

Students will identify a topic they’d like to use twitter data to look at. It could be a hashtag, a topic, a famous user, whatever. They need to craft their research question and learn and use tools that will allow them to analyze hundreds of tweets and thousands of users.

Twitter is open data and they have a robust API that the python tweepy library is useful for accessing. The free development accounts have some data limitations but should provide plenty of data for my students.

Web Scraping

Students need to find a topic that has both a Kaggle data set and web pages that contain data that could extend the data set. Kaggle is great because it has data on tons of topics. So far in class we’ve explored the olympic medals data set pretty extensively. The reason we’re not stopping there is that for this program students need to know more than just how to deal with well-formatted data. Scraping data from web pages is a really useful skill in those times when such well-formatted data doesn’t exist. Of course it’s interesting that even the clean Kaggle data often needs some more work to refine the format, so it’s really nice to start with that. The examples we’ve done in class is to look at the most popular first names for winning olympic medals. First name is not a column in the data set so we had to learn how to extract it from the full name column. Pretty straightforward stuff, but it’s already been fun to brainstorm different things to do with the data. We’ll get to the web scraping part later in class.

Digital Footprint

This course satisfies the “Diversity” requirement of our general education requirements. It does so by having students look at their own digital footprint from a privilege perspective. They’re going to seek out their own digital presence and compare and contrast it with those who are different from themselves. At first I thought it might be interesting to compare with each other, but I was smartly warned off from that. Instead they’ll write a report about their research into themselves and other cultures/countries/etc.

Weekly work

When I was setting up the calendar for the course I was trying to think about the best ways to infuse the major projects. I settled on Fridays. I figured we’d spend Mondays and Wednesdays working on tools and skills and then find ways to apply them in class on Fridays. Of course the bulk of the work they should do on these projects will be outside of class, but I want to make sure I’m modeling some approaches they should be taking.

I think I’m happiest with the Mondays and Wednesdays right now, as the focus on tools and skills is pretty straight forward. Fridays can feel a little at loose ends but I’m still working on it.

Take this past week: On Monday I asked the students to brainstorm (in groups) things to search for on Kaggle. Each group came up with a suggestion and then we voted. Olympics won and we landed on a data set that lists all the medals won from 1896 to 2012. It lists the sport, the athlete, the year, the location, and the medal. It has 31,000 rows (which I immediately asked the students to gut check). They then worked in their groups to brainstorm interesting questions to ask of the data and by the end of the class we had a great set of questions.

On Wednesday we took that set of questions and voted on the top three. I sampled only 15 rows from the data set so that they could be viewed without scrolling and asked the three groups to manually do the task they were assigned to. The topics were:

  • Which first names have won the most medals?
  • What is the connection between length of name (character count) and medals won?
  • Which repeating initials have won medals?

You can see that these are all pretty similar, but each group had slightly different things to do.

You’ll see below that I modified my instructions a little in interesting ways, but ultimately the groups were able to think and talk about how to go from manually dealing with a small data set and getting a computer to do it on a larger scale.

Class came to a close and I asked them a question about the type of resources I should provide to help them out. Should I abstract the skills they were talking about and make some screencasts that show how to do those tasks in pandas/python or should I just go do the three projects for them? I warned them that to fully do the projects would involve some things they haven’t learned yet (namely regular expressions) but I had a suspicion that it might be more helpful to them since they’d already invested some time thinking about these problems. They voted for that and we had a brief discussion about how I’ve only ever really learned how to use software tools when I really wanted to get something done. I think I’ll keep that in mind when producing resources for them in the future.

Finally on Friday they worked in pairs to brainstorm their own webscraping/kaggle project and I did some live coding for them that went a little sideways.

Describing data analysis to a third grader

Above I talked about how I asked students in their groups to manually determine how to analyze a small set of data. Specifically I asked them to carefully determine what they the humans are doing and write down the steps. I warned them that depending on the verbs they chose it might be easy or hard to later translate those steps into an algorithm for a computer to do the work. Easy verbs include “read”, “scan”, and “count” and a hard verb example is “figure out.”

They got to work and I was meandering among the groups. I noticed that the “first name” group’s first instruction was “split the string at the comma.” There’s nothing wrong with that from an algorithmic perspective, but I was worried it might be too computer-centric for all the students in class (not all have actually taken the programming class because we are trying to find ways to grow the program).

That’s when I had a great idea. I encouraged them instead to write instructions that third graders would have to follow. I picked that age/grade quickly and seemingly randomly but I think it was a decent choice. We talked about making assumptions about what they’d know and realized that we could assume some things, like how a lastname, firstname list would likely be recognizable to third graders even though they likely almost never write their name that way. It also helped remove phrases like “split the string at the comma” from their instructions. The ultimate idea was to get the students to understand that the types of things they’re interested in can be often explained at a simple level, and then it’s their task to find out how to translate that for a computer. I think I’ll keep going with that approach with other similar skill development days in the future.

Video coding assignments

As I’ve done with so many of my physics classes, I’m grading students describing their work instead of their work product. I’m finding that in a coding class that’s super interesting. I see a lot of videos with code that look quite similar (I don’t mind at all if they work with each other or find code online) but I never get two identical submissions. The students walk me through their code and it provides me an opportunity to send vids back to them asking clarifying questions. I’m doing Standards-Based Grading with this class so that feedback process continues through the semester.

I really like hearing the students describe their code. You can tell what they came up with themselves versus what they found elsewhere. You can tell what they really understand and what they’re just copying by rote from other work. You can tell when they haven’t thought about a particular case of inputs and when they’re thinking about how to extend the code. You can also occasionally hear their joy when it works!

Quick SBG note: I’m using my one week rule (that I used to call my 2-week rule) where if they let standard sit for a week the score solidifies. I think that will work well in a skills-based class like this.

Coding for themselves and not others

There’s one aspect of the text we’re using that I really don’t like. It’s constantly asking students to use input() and print() commands when doing things. At first I thought I just didn’t like it because I like jupyter’s notebook approach better (just do myfunction(4) instead of input(“hello dear user please input an integer”)) but I realized there’s something more subtle: For data analysis coding you’re often coding just for yourself. That’s one of the big things that distinguishes this class from the previous programming course. There you might be learning how to write code for others to interact with. In this class you’re using a tool to solve a problem. Often for yourself. Your audience comes in later when you give them a report.

Also, you do work with others during the coding, but nearly always that means you’re writing code that they’re going to (re)use. Hence a function that returns a list is almost always going to be more useful than a function that has a loop with a print statement in it.

I’ll be curious to hear your thoughts on this.

Google learning outcomes

On Friday we had a really interesting conversation about things I want students to take away from this class. I had just finished what I knew was going to be an unfulfilling (for all of us) live python coding session to show them how to investigate a kaggle data set about deaths from disease. I screwed up the syntax and ended up using the wrong functions a bunch of times. Once I was trying to add up deaths due to cancer and ended up just counting how many years were in the data set. Yep, super wrong, but it lead to this cool conversation:

I pointed out that a really important thing in a class like this is to realize that you can’t possibly memorize all the various python/pandas/etc commands we’ll be learning. They’ll need to figure out what system they’ll use to ensure that they can always figure that sort of thing out. I gave the example of keeping good notes somewhere but then admitted that I just don’t really do that myself. I asked what they thought I did instead and they nailed it: trust google.

What they (and I) mean by that is that you can quite easily find good syntax help by just doing something like googling “pandas sum column filter”.

But then I told them about another major thing I need them to take away from the class: confidence that things are possible. When I reflect on times when I’ve done a crappy job of teaching a software tool to someone, it’s when I fail to get the person to buy into that mantra: it’s possible! I think a class like this can succeed if it puts students in situations where they don’t know how to do something but they develop confidence that they can figure it out. This is so similar to physics teaching that I’m feeling dumb for not articulating that much earlier in my career. In physics we have tools (conservation of momentum, conservation of that human-invented, not strictly necessary idea – energy, etc) and we want students to think of how they can put those to use in solving problems. In coding, we have tools (libraries, software, apis etc) and we want students to think of how they can put those to use in solving problems!

Your thoughts?

Ok, that was a lot but I’m happy I got it down. It’ll help me as I continue to reflect on how to improve this class. Any thoughts/comments/questions? Here are some starters for you:

  • I’m in this class and I think it’s going pretty well. Here’s what has helped me . . .
  • I’m in this class and I honestly think it’s crap. Here’s why . . .
  • Why in the world did you let students in who haven’t taken a programming class?
  • My 3rd grader writes their name lastname, firstname all the time now thanks to this dumb post.
  • I think the one week rule is dumb and here’s why . . .
  • Another super long boring post from you. But at least you figured out how to do anchors so I could jump around – thanks!
  • I think all deans should teach at least one class, that’s a great idea!
  • I don’t think deans should be allowed near a classroom. This is dumb.
  • Why are you giving Canvas a chance?
  • I think if you want 3 person groups and you have 16 people you should have 5 3-person groups and a random person who’s screwed.
  • That “would you rather” site has some weird ones. Do you use them?
  • Because you sometimes capitalized Kaggle and sometimes not I assume you mean that there are two sites with similar capabilities. You only linked to one, though, so I’m totally confused
  • I don’t understand how this class can satisfy a diversity requirement. Can you say more?
  • What do you mean by “gut checking” data?
  • Here are some more hard verb examples for you . . .
  • What do you mean when you say energy isn’t strictly necessary?
Posted in sbar, sbg, screencasting, teaching, technology | 6 Comments

One die to rule them all

For a number of years I’ve been working on finding ways to turn what looks like an unfair die to a fair one (see these posts). Recently I’ve made a lot of progress. This post shows how I’ve turned a 36-sided unfair die into a fair 3-, 4-, 5-, 6-, 7-, 8-, 9-, or 11-sided die.

The two big accomplishments since the last post were to 1) roll an unfair 36-sided die 100,000 times, and 2) figure out how to optimize contiguous groupings of sides to make the fair dice I alluded to above.

The original die

I’m not quite sure why I decided to settle on a 36 sided die, but I did I guess. I knew that each roll would take a few seconds and I knew I needed some good statistics. I have access to a VDI machine at work that can run all night long so I finally got around to leveraging that and getting some decent statistics on the rolls for the die (100,000 rolls that took 5 days of continuous calculations!):

Comparison of actual rolls (red dots), area of sides (orange line), solid angle that side subtends (green), and volume of the die that the side projected to the center adds up to (blue). Note how the y-axis doesn’t go down to zero.

A number of interesting things are represented in that graph. First, you can tell that the die is unfair because the red dots aren’t flat. The minimum probability is ~0.021 and the max is 0.035, or 67% bigger. The three other curves are all pretty similar, and certainly all 4 curves are correlated. But it’s interesting that in my conversations with folks over the last few years I’ve run into people (and web sites) that would claim that side area, side solid angle, or side volume (really the volume of the chunk from the center projected out to a side which is also proportional to the mass of that chunk) should accurately predict the probability. It’s interesting that none do!

Evolving dice

Ok, next came the challenge of finding contiguous groupings of sides that would yield (nearly) identical probabilities. I thought I had figured out how to find random groupings as shown in this image for a random die I used in my last post:

9 different ways to break up the same random die into contiguous regions

The trick to do that was the Mathematatica command “FindGraphPartition” where the graph in question has the faces as the nodes and connections exist between faces that touch. That command finds regions that are connected, trying to keep regions with strong connections together. It does that by looking at the edge weight between them (higher number means they’re “more connected”). So I just fed that function the graph of my polyhedron with random (positive) numbers for the edge weights (for a 36-sided die there are 54 edges).

So I could run that random weighting over and over again to try to find regions that just happened to be fair. This is hit-or-miss, of course, so I thought I’d try to make a genetic algorithm work.

A genetic algorithm, or really any of the evolutionary programming types, work really well when you have a huge parameter space (54 different parameters, in this case, that can each be any real positive number) and lots of potential local minima. What I wanted to do was to use the 54 parameters as continuous variables and to let the genetic algorithm test random “parents”, rank them, throw away the bottom half of the population, and then repopulate that bottom half with “children” made from the remaining parents. I pair two parents up, take the first, say, 20 parameters from one and the last 34 parameters from the other and vice versa to make two kids. Then I “mutate” some of the kids “genes” by adding a random number to one or more of the parameters. Then I run them through the fitness test and the next generation repeats.

In this case the fitness test I used was the max probability (of one of the contiguous groupings) minus the min probability. I got the probabilities by adding the side probabilities involved in each contiguous region. Those I got from the 100,000 rolls that I did earlier. If the max-min goes to zero, then all the probabilities are equal and I’ve got a perfectly fair die. That never seems to happen, but after 1000 generations I tended to get decent results

Before talking about what I mean by “decent” here’s a pic of my fairest 5 sided die followed by a panorama of all my dice so far (again, they’re all made up of the same 36-sided die, just with different groupings of sides painted)

5 “sided” die made from a 36-sided unfair die
top row: 3-, 4-, 5-, and 6-sided fair dice
bottom row: 7-, 8-, 9-, and 11-sided fair dice
For all the die is shown on the bottom and the sides are broken out.

What is “decent” or close-enough to fair?

Since none of the genetic algorithm runs ever ended with a fitness of zero, none of the dice in the image above are strictly fair. But what’s fair enough? My kids and I decided that no one would really notice if after a few hundred rolls it seemed like the sides were roughly fair. That can be quantized a lot better, of course, but that’s the gist of what I did here.

Lets say you rolled a fair 7-sided die a bunch of times. What would the histogram of the rolls look like? If you rolled it an infinite number of times every side would come up 1/7th of the time. But you’re not going to roll it that often. If you only roll it 100 times, you might expect each side to be rolled 14 times with two of them being rolled 15 times. But you don’t usually find that. Instead you get more variability that you expect (or at least than some of us would expect). Counting statistics (or Poisson distributions if you like) would suggest that the typical variation after 100 rolls for each side would be the square root of 100 divided by 7 or roughly 3.8. In other words, most of the time the sides would be off from the expected 14 by 3 or 4 (either high or low – obviously the sum of the sides would be 100 still).

Ok, so what if you are suspicious that it’s not fair? Well, you can roll it a bunch of times and check the result against what the fair statistics would suggest. If you do it 100 times and you get a bunch of results within 3 or 4 of 14 you’d have to admit that it still seems fair. Of course if you’re patient you could roll it 1000 times. Then you’d expect each die to roll 142 or 143 with a typical spread of 12. Don’t freak out that 12 is bigger than 3 or 4. What really matters is the relative spread. 12 divided by 142 is smaller than 3 or 4 divided by 14.

So what I did was look to see how many rolls you’d have to roll my not-quite-fair dice to see results that start to look suspicious. The very worst of my dice would need over 500 rolls for that to happen. I guess I think that’s “good enough.”

Of course there are much more formal ways to do this. Mathematica provides DistributionFitTest for just this purpose. You can use it by providing a set of rolls of a die and ask the chance that the rolls came from a perfect die. It returns a p-value that can be interpreted as exactly that chance. Of course every 1000 rolls is different, so if you rerun the command with a different set of rolls you get a different p-value. That’s why what I’ve got below are histograms for each die where I rolled 1000 rolls 1000 times each. The x-axis is the p-value it found and the y-axis is the probability.

p-value histograms for each die with 1000 rolls

Note that doing the same thing with perfect rolls yield a graph similar to the 7-sided curve. Really the only one that’s not great on this scale is the 11-sided die. Note also that if I do this for an un-optimized contiguous side set you nearly always get a p-value less that 5%. This shows that I really needed that genetic algorithm.

Next steps

I’d love to 3D print my 36-sided die a few times and paint the sides. I bet I’d find some D&D folks interested in buying them.

I’d also like to figure out why I can’t do 10-sided. Mathematica crashes every time I run the genetic algorithm. I really can’t figure out what’s going on.

I’d love to figure out if you could predict the side probabilities from some physical measure of the die. Obviously side area, side volume, and side solid angle don’t do it, but maybe some other measure does. One thing I’ll try checking is looking at the effective depth of the potential energy dip for each side. What I mean is: to topple from one side to a neighbor, you have to make the die go up on an edge. This has a gravitational potential energy cost. Each side is a triangle and so you could get three measures of that for each side. Wouldn’t it be cool if the actual rolling probabilities tracked with that measure! Then I wouldn’t have to spend a week calculating those probabilities and I could really do some fun stuff.

Your thoughts?

Here are some starters for you

  • I think this is really cool. What I especially liked was . . .
  • This is really dumb. Here’s why . . .
  • Why do you call it a “fitness function” when you’re clearly minimizing. Try calling it a cost function, jerk.
  • I ran FindGraphPartition and occasionally got partitions that aren’t contiguous! What sort of magic did you do to fix that? (answer: updated the fitness function to make sure those got huge costs – used ConnectedGraphQ on a graph subset)
  • If you can’t make perfect dice, I’d never pay for them. You should say that more clearly at the top of this dumb post
  • Why do you bother with contiguous groupings? Couldn’t you just use random groupings and print them with the appropriate number in each triangle? (answer: I do get better p-value results this way, but my kids think contiguous is cooler)
  • I think you’ve succumbed to p-value abuse. You clearly run it over and over again until you get what you want. Hence the histograms.
  • I think you idea about the potential energy measure of a side has merit. Here’s what I’d do . . .
  • I think I know a different physical measure that will predict the probabilities. Here’s what it is . . .
  • There is no physical measure that will work. You’ve got to really roll them. I think you should pick a random shape, 3D print it, roll it 100,000 times yourself, look at the probabilities, then make a tiny change and repeat.
  • I don’t think a genetic algorithm was the best choice here. Instead I would . . .
  • All the Mathematica advertising gets old. I’m pretty sure I could do all this with my TI-84.
Posted in fun, mathematica, physics, research | 4 Comments

Lifelong computational skills

I’m frantically putting together my syllabus for our brand new Computational Data Science intro course (this comes after a programming course) and I realized that I’m not using one of my favorite syllabus planning tools: this blog!

This course was proposed and is listed in the bulletin thusly:

Title: CDS 1020 Introduction to Computational Data Science

Goals: To continue the study of computational techniques using Python, with an emphasis on applications in data science and analysis.

Content: This is a continuation of CDS 1010, applying algorithmic thinking to applications in data analysis. Topics include data mining, data visualization, web-scraping.

Prerequisite: CDS 1010

I’m really excited to teach this course, especially as it’s been a year and a half since teaching an in-person class (I teach one class per year in the dean’s office and last year I taught a fully-online course). However, I’m feeling the pressure to make sure this is a strong course and I have some things I’m grappling with right now. This post is trying to put my thoughts and questions down around the idea of skills/approaches/ways of thinking that I want my students to really own after this class.

Cool now versus later

As I’m looking at all kinds of cool ways to show students the power of computational approaches to data collection, analysis, communication, and use in decision making or story telling, I’m trying to think about what it takes for students to use those tools beyond my class. For example, while there’s lots of tutorials (like this cool one about Natural Language Processing in python) I’m sure I can help my students get running while they’re in the course with me, I’m not sure they’ll feel like they could really use that tool without my scaffolding handy. Instead, possibly I should focus on skills or approaches that would better empower my students, even if they’re not as powerful.

The nltk library for python is very powerful and doing some work with it during my course would likely cause students to appreciate its power as it helps them do cool projects. However, I’m nervous that the learning curve associated with it might make them not want to reuse it on their own time after my course. Of course the students that really dive in to our new Computational Data Science major might, but I’m not sure they’re my target audience for this first time through.

In my physics teaching, this reminds me of our work trying to get students to do automated data collection and experimental control using first LabVIEW (yes, that’s how it’s spelled) and later arduino. I was a huge LabVIEW user all through grad school and we had a mini-site license when I first started here. In our Modern Physics lab we taught the students how to use it and got them to do some interesting things in that course. However, we started to notice that students were not reaching for that particular tool the next year in our Advanced Lab. In that lab they design and execute their own team-based year-long projects, often based on ideas they’d find in the American Journal of Physics. We would hear things like “oh we’ll just manually record that data because it’s too hard to get the LabVIEW stuff working” or “I don’t remember how to install all the right things to get LabVIEW working so we’re not going to bother.” Later we switched the modern physics lab over to arduino, in the process reducing the complexity of the things they were interfacing with. Suddenly nearly all the projects in Advanced Lab were at least brainstorming ways they could get the arduino ecosystem to help them. So my lesson from that was that a slightly inferior tool set that has less logistical on ramps led to students using it more in the places we were hoping for.

Types of things I’m considering

Here’s a short list of the types of things I’m talking about and that I’m trying to make decisions about:

Tough on-rampEasy on-ramp
nltkregex (possibly starting with simple spreadsheet commands)
twitter apicopying and pasting from twitter search (and then some sort of analysis)
list comprehensionsfor loops
setting up local databases and using python to manage and analyzeUsing simple spreadsheets and perhaps google apps script to manage and analyze
Things I’m thinking about

Certainly I would choose the tough on-ramps if I knew for sure my students would be majors and would have someone like me around to both help them use the tools and cajole them to consider them when they’re doing complex projects.

For students who might not be majors and who I would hope would use computational approaches to decision making and story telling in the future, I might choose the easier on-ramps, even though in nearly every case above it’ll be limiting.

My guess is I’ll oscillate between those columns as the semester goes along.

Your thoughts? Here’s some starters for you:

  • Glad to have you back in the blog-o-sphere, where have you been?
  • This sounds like a fun class, can I sit in?
  • This sounds like a dumb class, can I lobby to have it cancelled?
  • I like the _____ on ramp things and here’s why . . .
  • The LabVIEW/arduino example is great, here’s a similar example from my work . . .
  • The LabVIEW/arduino exaple is dumb and doesn’t apply to this at all, here’s why . . .
  • As usual you couldn’t even bother to google some clear answers to these problems. Here’s several articles you should have read before even writing this drivel: . . .
  • Here’s some things I’d add to your table . . .
  • Wait, are you going to actually teach a python class? No Mathematica? I don’t believe it.
Posted in arduino, programming, syllabus creation, teaching | 8 Comments

Virtual Physics Conference

I’m part of a grant team right now brainstorming a new project, and a part of it is potentially hosting a conference. We kicked around some ideas about it, and as usual in situations like this, we casually talked about what a virtual conference might look like. That got my brain going so I thought I’d get some thoughts down here.

My goal: A virtual conference for physics teachers to be held potentially in the summer of 2020.

Whenever I’m a part of conversations like these, the typical pros and cons list look like this:

  • Pros
    • Cheap (I almost stopped this list here)
    • Flexible
    • Comfortable
    • Wider reaching
  • Cons
    • Not as immersive
    • Missing “hallway conversations”
    • Less connections
    • Less commitment from participants

I’ve been thinking about all of those and I think I’ve thought of at least a beginning of a plan that address all of them. Certainly the pros will still be there, but hopefully it’ll be an experiment worth doing if we can address the cons at least to some degree.

Technology

I’ve used a ton of different technology for doing meetings like these. Back in the glory days of the Global Physics Department we used both Elluminate Live and later Blackboard Collaborate (really the same software, just bought out by Blackboard). Since then I’ve used WebEx, Google Hangouts, and Zoom a ton and I’ve occasionally used others as well. For this experiment, I would mostly want a reliable technology, and the one that I’ve had the most luck with there is Zoom. But below I’ll lay out what I think the needs would be.

Participants at a minimum would need a computer/phone with decent internet speed and speakers. A microphone would be great and a camera would be too, but I think I’d be open to where we’d draw the “necessary” bar.

Speakers would need audio and video and screen sharing capability. It’s possible we could ramp up to something like dual monitors or something but I’m definitely open to suggestions.

Rough outline

My vision is something like this:

  • Parallel sessions
  • ~5 speakers per session
  • 4 sessions blocks in a day
  • A single day

Immersion/Commitment

This is the toughest nut to crack, I think. The longest online conferences I’ve been in were 8 hours long and it was hard to stay focused. So what would it take to get people to stick?

Taking the outline elements from above: Parallel sessions allows people some choice. Certainly at in-person conferences people really appreciate that, especially when a session doesn’t have what you’d thought it was going to have. ~5 speakers per session makes it seem like you could potentially hold all that info in your head at a time and really have a great conversation going. Four session blocks in a day just seems reasonable and one day is a great start for this experiment, at least I think that’s true.

Addressing issues like “my favorite part of conferences are the impromptu conversations that happen between sessions” is something I’ve been thinking about a lot. I think it would be great if we had technology that allowed the following:

  • Every session has a Zoom room (I’ll just use zoom vocabulary here to simplify) with a main speaker at any given time but a running commentary that people can participate in.
  • Questions will be submitted and voted on during each talk so that speakers can answer them in a crowd-prioritized way.
  • Discussion will use software like my “my-turn-now” software that allows for equitable discussions.
  • [This one I don’t know about existing solutions] This one is what I’ve been thinking would help the most with some of the cons above. I call it “hallway conversations.” I want any two-or-more groups to be able to spontaneously spawn a new zoom room. They would get video conferencing, a chat board, and a white board. They could welcome anyone else in who “knocks” and they could choose to be either public “Andy and Super-Cool-Person’s room” or private.
  • Drop in rooms for common topics
  • You’d get a personal recap record of every room you were in along with whatever contact info people in that room were willing to share. You’d also get a chat transcript and any whiteboards.

Imagine sitting in your pajamas with a beer and seeing that people you are excited to meet are in a public room. You knock and they let you in! You then can meet them and either hang at the periphery to just listen or jump right in. Kind of sounds like an in-person conference, doesn’t it? The originators could leave and the room would still exist until there’s not at least two people in it. The personal recap record would really help you maintain any contacts you’ve developed.

My other big idea is meals, specifically lunch. I envision partnering with something like Door Dash to get everyone a meal at the same time. They’d pick their meal at registration (possibly even same day, I suppose) and then it would be delivered to everyone at the same time (yes, I know, there’d be some time zone problems but I think it might be cool enough to convince west coast people to eat at 10). There’d be Zoom rooms for every type of food. You’d be in a video conference with anyone else eating “at the same restaurant” and you could hopefully be involved in some fun conversations (and of course you could still launch a “hallway conversation” if you wanted to).

Cost

This couldn’t be free, as the Zoom cost won’t be zero. But it would surely be cheaper than gas/plane + hotel that a normal conference would have. If we had 5 parallel sessions and 5 speakers in each session and 4 session blocks that’s 100 people. If we charged $100 per person that would be $10,000 which might be enough for the Zoom ideas above. I plan to research this a lot more.

Flipped?

A collaborator of mine shared this white paper from the University of California system that talks about an approach to virtual conferences that sounds a lot like a flipped conference. Speakers record their talks ahead of time and each talk has a discussion board associated with it. I think that’s a cool idea, but I’ve always been unable to get my cognitive energy focused like that ahead of a meeting. The plan above allows you to come in cold (with the exception of your own talk of course) and just let it flow over you dynamically. I’m curious what others think, though.

Your thoughts?

So that’s where I’m at with my brainstorming. Your thoughts? Here are some starters for you:

  • I love this idea, where can I sign up? I just had a couple of thoughts to make it better . . .
  • Um, ever heard of google? This exists and is called . . .
  • If I can’t shake someone’s hand I don’t think it’s a real relationship. How are you going to do that?
  • Love the “hallway conversations” but I think you’d also have to think about . . .
  • $100?! Way too _____. Instead you should . . .
  • I would love to facilitate a session. Can I shoot you some ideas? Who’s on the committee?
  • Could we do a poster session too? I have some ideas about how that could work
  • Door Dash exploits their delivery people. Instead think about partnering with . . .
  • Here’s an interesting way to mix your ideas with the flipped conference ideas . . .
Posted in community, glodal physics department, teaching, technology | 11 Comments

Google Apps Script Physics Problem Database

I tweeted out the other day an opinion about using google apps script (GAS from now on) as a web framework:

That led to some follow up from my awesome tweeps, including a nudge to write this blog post, so here you go.

This post will be mostly about how to use GAS as a data-driven, responsive website, with the Physics Problem Database really just the example I put together to show things.

Why GAS?

A data-driven website needs to store and retrieve data. Most of my other projects tend to use mysql databases for that (and PHP (yes, stop laughing and look up Laravel) for the html/interfacing) but that approach can have a pretty big startup cost (mental energy and time, not necessarily money). I certainly know how to spin up a new Laravel site and set up a new mysql database, but I know that’s a huge barrier for folks who want to just build something small-ish.

I’ve been using GAS for a long time now to help automate certain tasks (and you’ll note at that first link that I’ve thought about GAS as a website driver before – the difference in this post is that I don’t bother with the sketchiest part of that post in this new work – namely using the query command in a new sheet all the time). The way it can interact with a spreadsheet is what’s really driving this post. Basically I’m exploring how you might use a spreadsheet instead of a database to really get something up and running.

Benefits?

  • You don’t need a server! Or even a coding environment. I did nearly all of this coding on a chromebook because all you need is a google account and they provide the IDE (integrated development environment), the storage of the “database,” and the hosting of the pages
  • The “database” is a very user friendly environment. What sql would call tables, I just call different sheets. It’s very easy to see, edit, and delete the data in the “database”.
  • Both the server-side and client-side code is javascript. I’m not necessarily praising the language here, though it is fun to code in, but rather mostly praising the fact that you only have to know one thing (plus html, of course).
  • Authentication is basically built in. See below for more on that
  • AJAX (or the ability to update or query the “database” without reloading the whole page) is particularly easy

Drawbacks?

  • It’s not super fast. You’ll see how the physics problem database takes about ~5 seconds to load.
  • The spreadsheet can only get so big. I believe the relevant quota is 5,000,000 cells. I would guess that you could do fine with 1,000 – 10,000 main records.
  • You have to build your own unique ids, whereas sql will normally just do that automatically. You have to do this rather than just finding the row things are on to protect against someone changing the order of the cells in the spreadsheet (deletions, adds, sorting, etc). I suppose if you make it so that you’re the only one who can access the spreadsheet and make a promise to yourself never to change the record order, then you could skip this. This is especially important if you do some one-to-many or many-to-many relationships among the sheets.

Now I’ll shift over to using the Physics Problem Database as context to explain how you can stuff.

Physics Problem Database

Years ago the Global Physics Department put a fair amount of effort into a physics problem database. We thought it would be fun to both build such a thing for teachers to use, especially those doing Standards-Based Grading (who often have to give students new and different problems to try) *and* to help our members learn how to code. While a ton of people were interested, the barriers of learning how to get a database-driven webpage running were tough. So I thought I’d use that idea as context to really push this GAS approach.

For those of you who don’t care about what I have to say below about what I learned in doing this, here’s the direct link to the GAS Physics Problem Database

Goals:

  • Display physics problems that people could use
  • Allow only authenticated uses to be able to add problems
  • Develop a tagging system with a limited number of approved tags

The first thing I did was decide the data structure. After minimal thought, here’s what I came up with:

  • Problems
    • unique id
    • problem
    • user id
    • date
  • Tags
    • unique id
    • tag
    • user (didn’t end up using this)
    • date (didn’t end up using this)
  • Users
    • unique id
    • email
    • name
    • date
  • Problem_tag (this is the Laravel naming convention – it’s what some call a pivot table since this facilitates the many-to-many relationship between problems and tags)
    • unique id (not sure this is necessary)
    • tag id
    • problem id

Next I started by making the page that would just display all the problems. I wanted the display to show the problem and any tags that go with it. I think I meant to show who wrote the problem too, but I don’t think I coded that yet (though it would be super easy to do).

Ok, so how to manage the data? What I decided to do was to just load all the data in all the sheets into a massive javascript object. I actually do this a lot with other GAS projects that I work with. It seems that several hundred rows of data works just fine, so I think this is at least somewhat scaleable (which google insists is spelled wrong, by the way). Here’s the code that does that:

function loadData() {
  var ss = SpreadsheetApp.getActiveSpreadsheet();
  var sheets=ss.getSheets();
  var whole={};
  for (i=0; i<sheets.length; i++) {
    var data=sheets[i].getDataRange().getValues();
    // creates object with column headers as keys and column numbers as values:
    var c=grabheaders(data); 
    var list=data[0];
    ob={};
    for (j=1; j<data.length; j++) {
      ob[data[j][c["unique"]]]={};
      for (l=0; l<list.length; l++) {
        ob[data[j][c["unique"]]][list[l]]=data[j][c[list[l]]];
      };
    };
    whole[sheets[i].getName()]=ob;
  };
  return whole;
}

That produces and returns an object called “whole.” It has a key for every tab in the spreadsheet. The value for each key is an object with keys set to the unique ids. The values of those are objects whose keys are the the column headers in that tab. Say you wanted to find the problem associated with a particular problem_tag relationship. You’d get it with whole[“problems”][whole[“problem_tag”][unique-number-you-care-about][“problem id”]]. I know, it’s hard to read, but you can navigate all relationships this way.

How do you send that to be parsed in the html document? First note that all GAS projects can be made up of javascript documents and html documents. They’re all actually stored in the single script document. I use templated html where you can intersperse <? useful server-side javascript ?> into your html. So the table for the problems is done with this code (ugh, the html syntax highligher is failing on all of the “>” characters, replacing them with &gt – sorry about that):

<table class="table table-striped">
     <thead>
      <tr>
       <th>Problem</th>
       <th>tags</th>
      </tr>
     </thead>
     <tbody>
      <? Object.keys(data["problems"]).forEach(function(key) { ?>

       <tr>
        <td><?= data["problems"][key]["problem"]?></td>
        <td><?!= findTags2(key,data) ?> </td>
       </tr>
      <? }) ?>
     </tbody>
    </table>

The “forEach” part is going through all of the problems in the problems object (also note that I’m passing “whole” as “data” – don’t ask why.) Then each one adds a row to the table, displaying the text of the table with data[“problems”][key][“problem”]. Then it runs a function (on the server, before the page is rendered) called findTags2 that accepts the key (unique id for the problem) and the full data object and then returns a list of hyperlinked tags that, when clicked, show a page with just problems with that tag. That page does that filter by doing the “loadData” above and then deleting any elements that aren’t connected to that tag before sending data to a very similar html page. Note that to add in the creater of the problem I would just add something like <td><? data[“users”][data[“problems”][key][“user id”]][“name”] ?></td>

The only other thing the page does right now is allow authenticated users to add problems. That page is given all the tags and grabs the user’s email (you have to be logged into a google account to use the page). There’s a simple text entry box for the problem and the user can then select any appropriate tags. When they hit submit there’s an AJAX call to update the spreadsheet. All that means is that the page doesn’t have to reload to do it. The data is sent to the server, it updates the spreadsheet (in several ways – see below) and then returns a note saying it was successful. That updates some text on the page. It all takes about a second. The spreadsheet updates are:

  • Put the problem into the “problems” tab. For that you can use the “append row” method in Google’s SpreadsheetApp. For the unique id I just use the number of milliseconds since January 1, 1970, making the assumption that I won’t run the script twice in the same millisecond.
  • Then the “problem_tag” tab is updated, with a new row for every tag that was checked by the user. This is where I use the unique id for the problem (the unique id for each tag is embedded in the form to allow them to be passed to the server correctly).

The authentication is super easy if you’re doing this in a google-school domain. Basically you set the script to run as you (the developer) and use the users tag to check to see if the user email (that google provides for any user visiting the page) is in your approved list. That way you’re letting google do all the authentication (they have to be in your domain) and you can only allow those who are in your users tab to be able to even access the page.

Unfortunately the authentication is a little harder for normal consumer google accounts, but still doable. Unfortunately the command that returns the visitors email only works if you allow the scripts to be run as the person visiting the page. That means they need access to the spreadsheet, something you don’t have to do in the domain version. What’s cool, though, is that you can just give the whole world “view” access and this script will still work. What you have to do in addition to updating the “users” tab is to give those people “edit” access to the spreadsheet. Then everything works!

When users visit the page for the first time they have to go through a permissions check. Basically google checks your script to see what things you’re doing and makes sure the user is ok with that. The first time I did what’s described above for my consumer google account I noticed that the permission warning said that the script would have the ability to view, edit, and delete any of my drive files. Now I know I’m a trustworthy guy, but I figured even my friends would have a problem with that. Luckily I found this page that made it clear you can limit the access to just the associated spreadsheet, something I already was doing by giving them “view” access! Problem solved.

So, I think I’ve got a roughly-working beta version up and running. Please let me know your thoughts. Here are some starters for you:

  • I like this, especially how I could develop a web page typing a new line on every computer I stumble onto without having to load a full development environment.
  • I hate this: I need to do all my coding on my own local machine before even thinking about putting it up on the web. It’s too bad there’s no way to do that with GAS.
  • I like this but I’m nervous about whether it would scale. Why haven’t you just pasted in a bunch of nonsense problems to see when it breaks?
  • I got a “you’re not authorized” message when I tried to hack in and load a bunch of crap into your crappy database. Can you please give me access?
  • Your tag choices are dumb. Instead I think you should use . . .
  • I think you didn’t need to bother with restricting the permissions scope to just the one spreadsheet. I trust you!
  • If I’m at a google school can I build something that people outside of my domain could use?
  • Can users select problems and save them? Print them? LaTeX them?
  • What happens if you share the script with someone? Can they collaboratively edit? At the same time?
  • I’ve been laughing so hard at the fact that you sometimes code in PHP that I haven’t been able to digest the rest. Can you make it a flash video and put in on your myspace page?
  • I think it’s dumb to load the whole spreadsheet into memory. Just load in all the unique numbers and the rows they’re on and load stuff when you need it!
  • I just tried to email you about this at your job and got an away message saying you’re on vacation. You do this crap for fun?!
  • I see you have LaTeX in one of the problems. Are you just using MathJax?
Posted in HUWebApps, physics problem db, programming | 6 Comments

Shooting circuits

I’ve posted before about how I struggle teaching complex circuits (really just circuits that contain batteries and resistors in ways that can’t be analyzed with parallel and series tricks). There you’ll read about how I find that if I just give my students one of the unknowns for free it allows them to show me how well they understand the basic principles of circuits without getting bogged down in the math of, for example, five equations and five unknowns.

I’ve shared the ideas from that post a bunch and occasionally I get feedback that it robs students the ability to actually solve the circuits from scratch, since I’m giving them one of the unknowns for free. This post is about thoughts I’ve had about that, including some more substance to my ideas at the end of that post about guessing and checking.

Bridge circuit

The gateway drug that demonstrates the need of tools beyond series and parallel tricks is the bridge circuit:

Typical bridge circuit

The problem with this circuit is that you can’t model the resistors as a combination of series and parallel elements. Go ahead, try, I’ll wait!

… nope R1 and R2 are not a parallel pair

… nope R1 and R4 are not a series pair

… etc

Ok, now that you’re on board with that, the question is how to analyze such a circuit using the basic principles that went into developing the series and parallel tricks, namely that current flowing into a node flows back out again (conservation of charge or “no piling up!”), batteries raise the voltage from one side to the other by the EMF of the battery, and resistors reduce the voltage from one side to the other in a way that’s proportional to the current flowing through them (and the proportionality constant is conveniently named “resistance”).

Other answers to that question include:

  • Kirchhoff’s laws (do a bunch of loops and a bunch of nodes and hope you have the right mix that enables a successful linear algebra solution)
  • Mesh approaches that are really the same thing, with just a little different focus
  • Go in the lab and measure everything

My answer, as noted in that last post (it was 5 years ago!), is to make a guess for one of the currents and then follow through the ramifications of that guess until you reach a discrepancy. For the circuit above, for example, I would (note that when I say “voltage” I actually mean the voltage difference between that point and the bottom of the battery):

  • Start by making a guess for the current through R1
  • That enables me to calculate the voltage at the left node
  • That enables me to calculate the current through R4
  • Those two currents enable me to calculate the current through R3.
  • That enables me to calculate the voltage at the right node
  • That enables me to calculate the current through R2 (because I know the voltage drop across it
  • HERE COMES THE COOL PART
  • That enables me two ways to calculate the current through R5:
    • One way is to consider the voltage drop across it (which we know) and then determine the current
    • The other is to use the current flowing into the right node and make sure nothing piles up

Unless you make a lucky guess, those two calculations will not be the same. I’m calling their difference a “discrepancy”.

So what now? Well, as I stated in the last post, do all that again with a different guess and find out how the discrepancy changes. Since it’s a linear circuit, you then “just” need to extrapolate from those two data points to find out what guess would yield a zero discrepancy.

When I wrote about this 5 years ago, I gave a nod to the fact that it’s a lot of work to do all that. But now that I’ve actually tried it a few times, it really isn’t! The first pass is when you establish the relationships, and the second is easy if you use a tool like a spreadsheet. It also turns out that if your first guess is zero and your second guess is one the extrapolation is really easy as well.

What I mean by that last point is that if d0 and d1 represent the two discrepancies for a guess of zero and one respectively, the correct current is simply d0/(d0-d1).

Here’s an example. Let’s say that R1=1 ohm, R2 = 2 ohm etc and that V=10. Here’s the first pass assuming the current through R1 (labeled I1) = 0:

  • I1=0
  • Vleft=10
  • I4=10/4=2.5 down
  • I3= 2.5 left
  • Vright= 10+2.5*3 = 17.5
  • I2=(17.5-10)/2=3.75 up
  • I5a=17.5/5=3.5 down
  • I5b=6.25 up

So d1=6.25 – (-3.5)=9.75 (also note that Vright gives you a clue this is a bad guess since you wouldn’t expect any part of the circuit to have a voltage higher than the battery)

Here’s the second pass with I1=1:

  • I1=1
  • Vleft=10-1*1=9
  • I4=9/4=2.25 down
  • I3=2.25-1=1.25 left
  • Vright=9+1.25*3=12.75
  • I2=(12.75-10)/2=1.375 up
  • I5a=12.75/5=2.55 down
  • 15b=1.25+1.375=2.625 up

So d2=2.625-(-2.55)=5.175. Getting better.

That means that the correct current through R1 is 9.75/(9.75-5.175)=2.13 amps.

Yes, that seems to have gotten ugly, I admit. But repeating identical calculations is what spreadsheets are built for. Here’s one I built for this problem (note that I decided down or right would be considered positive):

Cells B2:D9 have the formulas indicated in column A. The yellow cell has the formula B9/(B9-C9)

Note the zero discrepancy in cell D9! Hmm, I wonder if certain people in my life will read that last sentence and let me have it.

So now I’m starting to think this method has some merit. We’re always talking about the value of spreadsheets in physics teaching (usually lab, but still) and now with this approach you’ve really only got to see what students do for the formulas in column B to see if they get the physics!

Not only would you look at their formulas, but the order they go through the circuit is important and makes me feel that this approach is closer to “problem solving” than “exercise” that Ken Heller is always pestering me about. What I mean is that in the usual Kirchhoff procedure students are given an excellent algorithm that has simple choices involved: What loops and nodes should I do? What direction should I go around the loops? Contrast that with carefully seeing what new piece of information you can discern from the previous step as is needed in this method. I think it involves more decision making. It also has some great teachable moments like above when I pointed out a voltage that was higher than physically possible.

Why I call it shooting circuits

I was sharing this approach with a colleague yesterday and she said it reminded her of the shooting method for solving second order differential equations. Here’s an example of how I use that to solve for the quantum states of a hydrogen atom. In that method you start with a guess of the wavefunction at one side of a quantum well and then look to see how it screws up on the other side. Then you make an adjustment to your guess and try to extrapolate the results so that it doesn’t screw up on the other side.

So this is a lot like that. What the heck, we’ll call it shooting circuits!

Series/parallel comparison

Consider this incredibly common circuit:

A typical circuit used to apply series and parallel tricks

First let’s consider the work necessary to calculate the current through all the resistors:

  • Combine R2 and R3 into Req1
  • Combine R1 and Req1 into Req2
  • Determine the current through Req2
  • Recognize that the current through R1 and Req1 is that same current
  • Determine the voltage at the node by finding the voltage drop across R1
  • Determine the current through R2 and R3 similarly using the now known voltage drops across them.

It’s interesting that many students think they’re done at step 3 (or possibly 2). They groan when you tell them that they still have to reconstruct the circuit to find all the currents.

Now let’s do it the new way, again using R1=1, R2=2, R3=3 and V = 10:

Similar to the previous spreadsheet but for the series/parallel circuit

So it’s 4 different statements of physics (A2:A5) and we’re done! All four of those statements demonstrate the student’s mastery of either Ohm’s law for a resistor or the node law. But remember that the order is interesting too! Can you do it in a different order? Does it work if you choose to make your guess for one of the other currents? Give it a try!

Pitfalls

I’ve been playing with this quite a bit and haven’t really found many pitfalls. One minor one involves the most basic parallel circuit (one battery, 2 parallel resistors). If you guess the current through one of the resistors you immediately get a discrepancy regarding the voltage drop across that resistor. That’s cool, as then you can apply the method, but you learn nothing about the other resistor! So then you’d have to repeat for that one, I guess. I think that means that a complex circuit that basically has two parallel parts might suffer from that problem.

Your thoughts?

Here are some starters for you:

  • I like this method, but would it work for . . . ?
  • I think this method sucks and here’s why . . .
  • What’s wrong with ending a sentence with a number and then an exclamation point?
  • What circuit drawing software do you use? They really look great!
  • Mathematicians would call this method . . .
  • Can you tell me more about how you can assume the discrepancy is a linear function of the original guess?
  • Of course you can describe a bridge circuit as series and parallel! Here’s how . . .
  • Seriously, you want students to do their homework with a spreadsheet!? You’re an idiot
  • I’m not sure I understand your pitfall situation. Can you describe it better?
  • Here are 7 more pitfalls I thought of within 10 seconds of reading this:
  • I clicked through to the old post (which you oddly called your last post – what the heck?) and gave up on you when I saw you hate Kirchhoff’s loop law!
  • Would this work for the “you have a cube made of resistors . . .” problem? I hate that problem.
Posted in general physics, physics | 8 Comments