Meeting styles

I, like most people, go to a lot of meetings. I’ve developed a style that I like to use when running meetings but I realize that I can always get better. I thought I’d put down some of the things that I like and some things that I struggle with and see what you wonderful folks think about them.

Check ins

Years ago I learned from some 3M folks that a social check in at the beginning of a meeting can help the team develop and can help people look forward to the meeting. I always really liked them and so if I run a meeting I always start with one. Typical prompts include:

  • Favorite meal
  • Favorite way to jump into a swimming pool
  • If you were a boat, what kind of boat would you be?
  • a skill you would like to develop

If you’ve got a large group, it’s usually best to go with a binary choice like “raking vs shoveling” or “0 degrees or 100 degrees”. If it’s a smaller group, you can do longer things, but this is where I hit my first stumbling block: some people really think these are a waste of time. In my typical meetings these take about 5 minutes, so I agree it’s a big chunk of time. I enjoy them so much, though, that I always schedule them. My question: how can I be more sensitive to the people who don’t like taking the time?

Agenda changes

I like to make sure that there are opportunities to make changes to the agenda. This starts with posting the agendas early enough to let people think about it. I tend to try for 2 days but I’m not great at it. Then in the meeting I like to make sure people can make changes to the agenda early and with some democratic approaches.

One thing I’ve noticed is that often folks want to just do their new agenda item right then, as opposed to finding room for it in the normal agenda. I’m not sure why this bothers me so much, though I would guess it’s because I’m worried about time.

Action item check in

When I write my agendas I try to go back to my notes to see what action items were assigned at the last meeting. I then put them in the agenda, even if it’s clear to everyone that they’re done. My thinking is that the accountability is always clear and that we can celebrate the things that got done. Of course I’m also making sure things don’t get lost. I try not to do any shaming when someone hasn’t done something, but I like everyone to know what still needs to get done.

The pitfalls here are that the mini reports can really take some unexpected time, but they’re often topical and timely. I think sometimes folks feel like they’re calling people out, so I’d love some thoughts on how to soften that.

If you pair this with sending the agenda out a couple days in advance, I’ve noticed that a lot gets done during those two days. Certainly that’s true for my action items!


This started a while ago for me and now I’m hooked. Before getting to the meat of the agenda, I do the “moments” section. For now I use 4 different moments:

  • Oh Shit
  • How do I
  • Upcoming deadlines
  • This is great

Really I just started with “Oh shit” because that particular group would often have some emergencies come up that the whole group could help with. I’ve found that if people know those are there, they know they can bring up their quick things without having to necessarily add something to the agenda.

Too profane? I had to do “Oh shoot” for one group I was in.

One big drawback is that these can take up some serious time, but my opinion is that if they’re that pressing, they likely need the time. Your thoughts?

Agenda items

Then I get to the meat of the meeting. When I remember, I try to put a time estimate for each. That tends to help the group stay on track, but I know sometimes folks get crabby if the time estimate is clearly too low to get anything decent done.

Some things that bother me about normal agenda items are ones that get the group doing things that aren’t efficient. My favorite pet peeve is group wordsmithing. I used to also dislike group editing, but that to me is much preferrable to wordsmithing. I think it’s better to just make clear the goals of the passage and then to assign someone to write it. I assume that those that like/want to do wordsmithing just want to get it done, but it’s rare that I enjoy the experience. It’s also interesting to see what happens when people with very different typing speeds work on a collaborative document.

Action item round up

I’m terrible at this (though I tend to take decent notes) but I want to try to do a better job at the end of meetings making it clear what has been decided about next steps. Using the “assign to” feature in google docs works great when I’m taking minutes, but I think it’s probably good for everyone to hear what they’ve committed to before the meeting ends.

Set the next agenda

I’m terrible at this. I almost never do it. But I think I’d like to try getting better at it.

Your thoughts?

I’ll admit it: I mostly wanted to try out the wordpress app on my phone now that I have my nexdock so I can treat my phone like a laptop. But this is a topic I’ve wanted to get down for a while, so it was a good excuse.

So, some thoughts? Here are some starters for you:

  • I love going to meetings with you. I just wish that you . . .
  • When I see you’re going to be there, I make up excuses not to go.
  • Here’s some ideas for check ins . . .
  • Here’s some things to avoid with check ins . . .
  • Wait, your phone is powering a laptop?
  • This all feels way too rigid. You need to relax and just let things flow!
  • My meetings are all dominated by “oh shit”. Why do you even bother scheduling anything else?
  • You should have crowdsourced the wordsmithing of this post.
  • I hate action item roundups. I know what I’m supposed to do and I don’t like getting called out.
  • Do your current online meetings change any of this?
Posted in Uncategorized | Tagged | 1 Comment

Fast Quantum Tunneling Method

This post describes a way to calculate tunneling probabilities for one dimensional quantum barriers. This method is easy to code up, and is very fast.

Consider the following barrier. If your energy is less than 3 eV, you’ll just reflect off. But above 3, weird things happen. How do you calculate the reflection and transmission coefficients?

The barrier considered as the example in this post
The barrier considered as the example in this post

Quantum tunneling is a favorite conceptual topic for students. It is a notion of something so very different from what is expected classically that can be described so easily by invoking memories of throwing balls at walls. Students are encouraged to find connections with frustrated total internal reflection in order to further cement their understanding of matter as waves. Both in optics and quantum mechanics instructors can note how the continuity of the wave function and its derivative across boundaries implies that the wave cannot abruptly go to zero. This enables students to see why a (typically small) portion of the wave can tunnel through a barrier.

On the other hand, the quantitative aspects of tunneling are a different story for students. As usual very simple situations can be calculated by hand like the square barrier but more typical barriers found in the lab require the use of a computer and an algorithm that can apply the conceptual physics they have learned (the boundary conditions) in an iterative fashion.

In this post I’ll talk about a different algorithm to calculate the tunneling probability of a particle with known energy through an arbitrary one-dimensional potential barrier. It is both fast and accurate but it also uses tools that most students are familiar with in their studies of the Schroedinger equation. Specifically it involves the direct integration of the Schroedinger equation in a manner very similar to the shooting method employed to find the eigenstates of an arbitrary potential well. However, instead of needing to adjust parameters to find a particular eigenstate, students can directly inspect the results for any given particle energy and determine both the tunneling probability and the shape and nature of the wavefunction inside the barrier.

Other methods

The most common approach to calculating tunneling probabilities is to consider the barrier to be a collection of square barriers. In the WKB approach, only the exponentially decaying portion of the wavefunction is kept and integrated through all the slices (Simmons 2007). In the matrix transfer method, the boundary conditions among all the slices are carefully calculated (Alexpoulos 2007, Mendez 1994, Morelhao 2007 (pdf), Probst 2002, Zhang 2000). Specifically, at every boundary between the square slices the wavefunction and its slope are continuous. In each slice the wavefunction is composed of two components: either a right and left traveling wave with a wavelength determined from the kinetic energy (the difference between the total energy and the barrier height); or a growing and decaying exponential whose growth rate is determined from the (negative) kinetic energy. Often these boundary condition equations are described in a matrix formalism as they are simple linear equations relating the incoming and outgoing wavefunctions along with the barrier heights of the slices. The effect on the incoming wave by the barrier is then modeled by a single matrix that can used to solve for the tunneling probability.

There are also some approaches in the literature that have more directly integrated the Schroedinger equation, but all do a forward propagation as opposed to the backward one described below (Ban 2000, Yunpeng 1996). These approaches use both numeric and analytical methods to determine the phase of the incoming wave that enables solely a right-traveling wave in the transmission region. The method below does not require such adjustments and simply gives both the wavefunction in the tunneling region and the tunneling probability after a single direct integration.

My method

Consider a tunneling situation as laid out in the figure at the top of this post. The first and third regions have a constant potential while the middle region can have any form, including discontinuities and regions where the particle is classically allowed. Region I can have right- and left-traveling waves

\psi_\text{I}=Ae^{i k_\text{I} x}+Be^{-i k_\text{I} x} (1)

while Region III only has a right traveling wave:

\psi_\text{III}=Fe^{i k_\text{III} x} (2)


k_\text{I}=\sqrt{(2m/\hbar^2)E} (3)


k_\text{III}=\sqrt{(2m/\hbar^2)\left(E-V_\text{III}\right)}. (4)

Using a fourth-order Runge-Kutta technique I numerically integrate the real and imaginary parts of the Schrödinger equation from the right edge of Region II (x=L) to the left edge (x=0). Note that since the Schrödinger equation does not have any single derivatives in it a Numeroff approach can also be used. Note also that in Mathematica you can integrate complex numbers with just one call to the Runge-Kutta solver (NDSolve). Since both the wavefunction and its slope will be the same on both sides of the boundary between Regions II and III, the initial conditions are determined by arbitrarily setting F=1 and using the form from Region III:

\psi(L)=e^{i k_{\text{III}}L}\quad \text{and}\quad \psi'(L)=i k_{\text{III}}e^{i k_{\text{III}}L} (5)

To determine the transmission probability, T, we need to find the value of A:

T=\frac{k_{\text{III}}}{k_{\text{I}}}\left|\frac{F}{A}\right|^{2}=\frac{k_{\text{III}}}{k_{\text{I}}}\frac{1}{\left|A\right|^{2}}. (6)

This is done by investigating the value of the wavefunction and its slope at x=0 where, according to Eq. (1):

\psi(0)=A+B\quad \text{and}\quad \psi'(0)=ik_{\text{I}}(A-B) (7)

Once again we have used the equality of the value and slope of the wavefunction across a boundary.

Combining Eqs. (6) and (7) yields

T=\frac{k_{\text{III}}}{k_{\text{I}}} \frac{4}{\left| \psi(0)-i\psi'(0)/k_{\text{I}}\right|^{2}}. (8)

Once the Schrödinger equation has been numerically integrated, the transmission probability is easily calculated.

This method employs many techniques used when teaching the numerical solution of eigenstates for arbitrary barriers. In those situations students are taught to employ the shooting method to find energies that produce physically allowable wavefunctions. (like I’ve posted about before). The major difference in this new application is that both the real and imaginary parts need to be integrated, as is illustrated in the figure below. If you only do the real part (as is often done in the shooting method application) you are unable to calculate the transmission coefficient as seen in Eq. (8) above.

The real (solid) and imaginary (dashed) parts of the tunneling wavefunction for the barrier shown in Figure (1) for a particle energy of 7.5 eV. This yields a tunneling probability of 93%.

Examples and Comparisons

The transmission coefficient (T) as a function of particle energy for the potential shown in Figure (1) is given in Figure (3) below. The top curve is the result of the current method while the lower lines use the transfer matrix method with varying number of slices of Region~II. Ultimately both approaches converge to the same result at every energy.

The transmission probability, T, as a function of particle energy for the potential barrier shown in Figure (1). This compares the current method with the matrix method with the number of slices set to 10, 20, 50, 100, 200, 500, 1000. The current method is the top curve. In the inset the current method is limited to a total number of steps ranging from five through twelve.

It is interesting to compare the transfer matrix method with the new method where the number of slices is compared with the number of steps that the Runge-Kutta method employs. The transmission probability versus energy for the arbitrary barrier shown in Figure (1) is given for the total step number ranging from 5 to 12 in the inset of Figure (3) above. The curve with 300 steps is also shown. It is clear that the number of steps needed for the Runge-Kutta method is far less than the number of slices needed in the transfer matrix method to achieve the same accuracy. Note, however, that one should really compare the number of calculations involved when doing these comparisons. A fairer comparison would need to multiply the number of Runge-Kutta steps by four, though this still shows that the current approach compares favorably to the transfer matrix method.

Resonant Tunneling

As an example to show the pedagogical uses of the current method, I consider resonant tunneling. Specifically I compare the wavefunction in the barrier region to the eigenstates expected for a simply-shaped barrier.

Consider the potential shown below.

Potential energy barrier for resonant tunneling

This parabolic potential barrier is parabola centered at x=1 but chopped off at x=0 and 2. The analogous potential well that is not truncated has resonant energies at

E_n=\left(n+\frac{1}{2}\right)\hbar \omega = \left(n+\frac{1}{2}\right) 1.232 \text{ eV}\quad \text{where }n=0,1,2,\ldots (9)

The transmission probability as a function of energy is shown here:

The tunneling probability versus energy for the parabolic potential shown in the previous figure

The resonance peaks shown correspond very nearly with the eigenenergies of a parabolic well. At lower energies where the resonance peaks are very sharp the energies are the same as the eigen energies. As the peaks become broader, the resonant energies become larger than the eigenenergies by as much as 20% for n=11.

The reason the resonant energies grow larger than the eigenenergies as the energy increases is due to the boundary conditions that the wavefunction has to match at x=0 and x=1. This can be seen by comparing the resulting wavefunctions (both the tunneling wavefunction and the eigenfunction for the parabola) as seen here:

Comparison of the resonant tunneling wavefunction with the eigenfunction for a parabola for n=5 (top) and n=9 (bottom). The probability density (\psi^*\psi) is plotted as a function of position inside the potential barrier. The curves are all normalized to a max of unity. The resonant wavefunction has higher energy so that the curve can bend to better meet the boundary conditions.

Close inspection of the wavefunctions near the boundaries indicates the differences between the tunneling wavefunction and the eigenstate. While the eigenstate is decaying to zero in all cases, the tunneling wavefunction is forced to match the boundary condition at the right edge. At low energies there is little difference as the exponential rise is very steep but at higher energies there is a higher bend needed that explains the rise in the energy compared to the analogous eigenenergy.


I have discussed a new method for calculating both transmission probabilities and wavefunctions for a particle tunneling through an arbitrary one-dimensional barrier. The approach is applicable at the undergraduate level as it uses common tools related to the shooting method for finding potential well eigenstates. It is fast and accurate and enables the study of complex phenomena like resonant tunneling.

Your thoughts?

Here are some starters for you:

  • This is really useful. I plan to use it in . . .
  • This is dumb, I’d never use it and here’s why . . .
  • This reads like an article you wrote for the American Journal of Physics that got denied with a reviewer saying since it was so easy to code up it wasn’t worth publishing.
  • Wait, so you integrate from right to left, I didn’t think that was allowed!
  • So you just assume that something makes it through and work back to see what could have caused it? Weird.
  • You lost me at Mathematica
  • Can you help me implement this in . . .
Posted in mathematica, physics, research, teaching | Leave a comment

Privileging screen space in virtual meetings

I’ve been thinking a lot about ways to make virtual classes and meetings as useful as possible. Certainly that’s what’s behind all my work with my synchronous dashboard (see here and here). These days I’m part of a team helping people prepare their fall classes (in person, hybrid, and online types) and I’m on a team planning a fully virtual New Faculty Workshop for Physics and Astronomy this fall. I’m also super excited to be meeting with an informal team put together by Stephanie Chasteen looking at virtual professional development. In this post I want to try to organize my thoughts around what I’ve been calling “privileging video” in virtual meetings.

I’ve been hearing and learning about a lot of really cool digital tools people can use in virtual meetings. But there’s always the thought that creeps into those conversations about how hard it is to both see people and interact with those tools. Certainly some people have multiple monitors and don’t have issues, but that’s not the majority of people that I interact with. That’s what I mean by “privileging video.”

Seeing people is great! You can tell if they’re really engaged and you can see the normal unspoken signs of confusion, amusement, frustration, etc. It’s the main reason colleagues of mine don’t like using my dashboard tool (especially when I force them to). It’s also the easiest way to take true attendance (as opposed to just seeing someone has logged in).

But the problem is that video takes up so much space on your screen, it crowds other tools out. Certainly most video meeting software venders allow for tools other than video (chat, hand raising, polling, etc) but it’s very clear that all of them privilege video. Just look how chatting in Zoom or Google Meet has to take an active click from the user. Or how the chat window can so easily be lost or covered whereas they take great pains to ensure that the videos have primacy, or at least a clearly protected region of the screen.

Compare that to audio: If you’re using an online audio tool (or just using the audio of a video meeting) that tab doesn’t even have to be front and center. It doesn’t take up any screen space.

Not sure if an audio-only conference can be productive? Spend 5 minutes some time watching teenagers using Discord to solve problems in a video game. You’ll see that they are both engaging with each other and solving problems in the stuff that is taking over their screen. Discord provides easy audio, chat, and emoticons. And that’s it! They assume you’re using your screen for something else. That’s why it got built in the first place. I happen to live with three of my own children who do this all the time. In fact, as I was developing my dashboard they kept telling me to just use Discord. They were probably right.

Sometimes folks will share their screen, shoving the vids of the participants to the side. That’s a better use of screen space, but it still severely limits collaboration. The rest of the folks can only watch and hope to occasionally interrupt the presenter. It’s interesting to look at the difference between when a presenter shares their screen showing a google doc and when, instead, everyone just logs into the google doc. Depending on what the group is trying to accomplish, each approach has its merits. The latter, however, gives much more agency to all the participants.

When thinking about teaching, it’s interesting to note that while teachers are used to seeing everyone’s face, students really aren’t. They see the teacher and perhaps their small group members (I’m talking about in person here) but they don’t normally have the ability to stare at the faces of all their classmates.

So I think I’m a little down on “privileging video” but I wanted to get my thoughts out there so, as usual, I can refine my thinking by bouncing some ideas off you.

Here are some starters for you:

  • I think I’m down on privileging video too. My biggest issue is . . .
  • I love privileging video. What you’ve forgotten about is . . .
  • Why do you sometimes not capitalize google?
  • I love your dashboard tool. Can you help me with it?
  • I hate your dashboard tool. When are you out of the dean’s office so I won’t have to use it any more?
  • Wait, your school is going to have in person classes?
  • Wait, your school is going to have online classes?
  • I think sharing my screen does give the rest of the participants agency. Here’s how . . .
  • If it weren’t for cool Zoom backgrounds I’d stop doing video meetings right now
  • Wait you mentioned Zoom, so can we use it?
  • I’ve used Discord and you’re right about . . .
  • I’ve used Discord and you’re way off base. What you don’t seem to realize is . . .
Posted in online class, teaching, technology | 4 Comments

Web App Dev with GAS course

Next month (July 2020) I’m teaching a course called Web App Development with Google Apps Script. I’m excited about it but I realized that I’ve never really described what will happen in the class, especially for those outside of Hamline who might be interested in auditing.

This is what we call an experimental class in that it’s not in the bulletin. You can teach those three times before you have to get it in the bulletin. This will be the first time I teach it. It’s in our new Computational Data Science program and it doesn’t have any prerequisites.

If you’re interested in auditing, the fee is $250 and you can get the form here. The details of the class (that are needed on that form) are here. If you want the 4 undergraduate credits for it, the cost is something like $2,600 (hence the plug for auditing if this is just skill development for you).


I do a ton of Google Apps Script programming in my job in the Dean’s office. Here’s just a few examples:

  • Major Declaration form (students submit their major choice and information about their advisor choice. Chairs are informed and eventually fill in the advisor using a specialized dashboard)
  • Scheduling notes (the fall 2020 schedule is in a spreadsheet that is used to make a web page for Administrative Heads to put in notes about the classes. Notes are emailed to the appropriate audience with a link to a page where more notes can be added)
  • Synchronous Dashboard (Participants in a meeting or a class can interact using emotion buttons, chat, hand raising, and an interactive whiteboard)
  • Dashboard for phone queues for summer registration (Students are in groups to ask general questions but FERPA-related questions are sent to a queue system where students request a phone call from the appropriate offices)
  • Tool to spawn Google Meets breakout rooms

I find GAS to be great for a lot of reasons. See the “Why GAS” chapter of the book I’m writing for the course.

Goals for the course

I’m hoping that my students will be able to:

  1. Make a web page that displays useful but protected information
  2. Protect their pages using the built-in authentication (really that’s just for GSuite customers)
  3. Allow and leverage communication with a back-end spreadsheet that acts as a database
  4. [most importantly] produce an App that address a concern of theirs.

Structure of the course

We’ll meet on Tuesdays and Thursdays synchronously for two hours (9-11 CDT). Those sessions will involve me doing some live coding but mostly encouraging the students to work on their projects in ways that they can help each other (like I talk about in this post under “sharing screens in lab”). Most sessions will likely have a few pauses where we’ll identify the tools they need to develop. I’m hoping for conversations like “I wish I could filter the rows to only get the ones that need attention” that would lead to a discussion about the filter tool in javascript.

There will likely be daily assignments that address:

  • Javascript skills
  • GAS tools
  • App brainstorming
    • including helping each other

along with weekly goals to reach. It’s possible I’ll have them work in teams, but that depends on the number of students. Right now there are only 3 registered and only a couple that I know that are thinking of auditing. Likely with that number I’ll just have them work alone. If there’s enough to work together, though, I think it might be fun to have them use at least some aspects of the Agile approach.

Is this a programming course?

No, not really. I’m hoping students can think of a problem that a simple web app could solve and that they’re willing to learn how GAS can help them achieve it. I want them to start by explaining in English what they’d like to accomplish and break it down into small steps. Then for each I can work with them to determine the minimal amount of programming skills they’d need to develop to do it. This is very similar to what I talk about in this post about the CDS course I just finished teaching.

I’m sure some will really take issue with this, but I think I’m ok with students “just getting it to work” as opposed to really understanding some of the syntax they’ll find via google. Certainly that’s what I’m going through as I’m working with a student right now to create a tool that music directors can use to have a fruitful remote rehearsal using the Audio Web API.

Is this for me?


Seriously, though, if you’re inspired by some of the projects I’ve talked about or see a need to fix some unaddressed problems in your workplace, GAS and specifically Web App Development using GAS might be for you. You don’t have to know how to program, I’ll help you with that!

One of the beauties of GAS is that all you need is a google account. From there you can do all your programming in a browser. You don’t need any other software. I do most of my programming on a Chromebook!

Your thoughts?

I’d love to hear your thoughts about this. Here are some starters for you:

  • I really think this is great. How would it work if . . .
  • I think this is dumb. Why don’t you instead . . .
  • I thought you loooovvvve PHP. Why aren’t you teaching about that and your beloved Laravel?
  • I’m interested, but I know I can’t make all those synchronous times. Can I still audit? (yes)
  • $250? That’s highway robbery. This should be free.
  • I’m a staff member at HU, how much for me to audit? ($0)
  • I’m a student at HU, how much for me to audit? (possibly $0, still working on it)
  • Wait, what are you building for virtual music rehearsals?
Posted in dean, programming, teaching | 1 Comment

Synchronous meeting dashboard

A while ago I made a site (and blogged about it) that I dubbed “my turn now” (think of a young kid begging to play when you say that aloud) that facilitated moderating a discussion. At the time I made it for in-person meetings and classes but this week I’ve found just how powerful it can be in synchronous online meetings. In this post I’ll describe how I’ve updated it and why I think it’ll be useful to me (and others?) in this time of remote learning.

Recognizing that not all of our students have fantastic internet connections at home, I’ve been thinking a lot about how to reduce the bandwidth of productive synchronous class meetings. The big culprit is the video streams. Audio is nothing compared to video from that perspective, so I started to think about what video provides.

When I talk to folks who love Zoom or Google Meets + gridview they talk about feeling stronger connections with their colleagues. They talk about facial expressions, body language, and hand gestures. Basically they’re saying that video augments audio and chat in ways that are hard to quantify and that are close to necessary for strong interactions with each other. Certainly I can attest. The big project I’ve been working on for years gave up on other online meeting platforms when we realized that Zoom lets all participants see each other. We knew it was vital and so we shelled out for it.

So what can we do if video becomes a liability, either due to bandwidth or privacy or whatever? That’s what I set out to work on over the last several days. I knew my “my turn now” app had something going for it, but it needed more.

As I reflected on positive experiences I’ve had in online meetings, I remembered the good old days of the Global Physics Department. We used Elluminate Live (later Blackboard Collaborate) and hardly ever used video (mostly a bandwidth problem back then). One thing we loved were the emotion emojis we could use (mostly “thumbs up” but also “clapping” etc). I thought perhaps expanding that a little could help with low-bandwidth community building.

So I set out to expand “my turn now” to add a few things:

  • “emotion” choices that participants could choose and display to others. For now I’ve settled on “confused”, “excited”, “clapping”, “agree”, “disagree”, and “cat on my computer”
  • Chat that isn’t lost if you have to log back in. First of all this is a very low footprint site so likely folks won’t have to log back in, but even if you join a meeting late all the prior chats will be there for you. That’s true of the hand-raise queues too.
  • Access control (don’t want to face the emoticon equivalent of a zoom-bomber)
  • “my turn now”-like hand raising facilitation. If you didn’t click out before here’s the quick version: There are 2 queues: one for follow up questions on the current topic and one for new topic questions. Participants “raise their hand” and everyone can see where everyone is at in the queues. Participants can also transfer their “hand” to the other queue which is then re-sorted to make sure whoever is earliest gets called on first

Today I ran a meeting with ~30 participants and tested it out. It worked really well! Things that I noticed:

  • We never had a microphone collision. Never did two people try to talk over each other. The “raise hand” queues worked really well and everyone knew who would talk next.
  • We didn’t have topic ping-pong. That’s when every other speaker wants to go back to the topic two speakers ago. The two queues really help with that.
  • Colleagues were forgiving of each other when people had to go from my site back to the google meets tab to turn on their mic
  • We almost never had more than one live mic
  • No one complained of bad internet connections (a few turned their video on but most didn’t)
  • The emotions were indicated with color (their name in the roster went to white text on a solid color background) and my eye tracked that pretty good with just my peripheral vision.
  • Questions like “does that sound like a good plan?” yielded rapid “agree” and “disagree” colors showing up in the roster.
  • The chat seemed more vibrant because it was front and center. Both Zoom and Google Meets makes it too easy to ignore the chat in my opinion.

Overall I was very pleased (hence this blog post!).

A quick set of notes on technology and scalability:

  • “my turn now” was PHP-Laravel based and required me to program on a windows machine, post to github, pull from github to a Hamline server, save data in a mysql database, and connect to Pusher for the real-time notifications
  • This uses Google Apps Script. I program on whatever device is handy (my chromebook works great), both the client and server are in javascript, and the data is stored in a simple to access and simple to read google sheets document. Once again I’m using Pusher for the real-time notifications.
  • “my turn now” was hard to share. Because of the free-version of Pusher limiting me to 100 simultaneous connections (think 100 people in a meeting) I never let anyone else use it. Sure they could sign up for their own Pusher but they’d either have to share their Pusher account with me or do all the github/PHP/server crap themselves.
  • This is super easy to share. Sign up for a free Pusher account, contact me to get a clean copy of my google sheet, paste a few things in and you’re good to go! At least one colleague is interested in that so hopefully in a few days I can see just how hard that is (I’m really not worried about it).
  • Access control is built in to google apps script, especially if you’re a google school
  • The chart you can see on my old post is totally doable in this version too, since all the relevant info is saved in the spreadsheet. I just haven’t coded it yet.
  • I think I’ll post all my code on github soon so people can just do it themselves

So, all in all I think this is an exciting development. I really think you can have dynamic, interactive online meetings/classes without the huge bandwidth load of video. Of course you still need audio, but a dashboard like what I’ve made can really make a difference.

Your thoughts? Here are some starters for you:

  • I was in today’s meeting and I thought it went great. My favorite part was . . .
  • I was in today’s meeting and I couldn’t wait for it to end. What really sucked was . . .
  • Once again you’ve outed yourself as a PHP user. I’ve said it before but I really mean it this time: this is the last post of yours I’m ever going to read
  • Why don’t you code this in Meteor?
  • You really like bullet point lists, don’t you?
  • What do you mean when you say “google school?”
  • Why didn’t you write that “google school”??
  • I think you should add these “emotion” buttons . . .
  • I love it when multiple people are talking over each other. It’s like a battle royale and I can’t wait to see who wins. Why are you trying to take all my fun away?
  • Here’s some other great things you get if you can literally see your colleagues . . .
Posted in programming, teaching | 8 Comments

Cold live coding in class

I tried an experiment yesterday in my Introduction to Computational Data Science course. We have been working on doing analysis of Kaggle data sets, with each student having picked what they want to work on that will also lead to some web page scraping later in the semester. They have to analyze the data in multiple ways and eventually tell a story with it. We’ve been learning lots of python and pandas tricks to do all this work, but I wanted to help them deal with the sorts of questions they constantly have to think about. So I decided that together we’d all tackle a brand new (to us) data set in class and see how far we could get. I didn’t know what to expect, but I knew we’d hit a bunch of tough points and I planned to have some meta conversations about them when we did. It went pretty well and I thought I’d get some notes down here about it.

We started with each small group (which, of course, I set up using my “Would you rather” approach) proposing a search term we could use on Kaggle. Then they voted on the proposed teams and I typed the winner (“wealth”) into the search box. Then we voted on which search return we should use. (Note that I’m not going to give the details about which data set we used. See below for why.)

Next we loaded the data set into python (using Google’s Colab, if you must know) and started looking at it. A quick shout-out to the pandas library: the data had 1,000,000 rows and we were able to work with it with no problems at all (imagine opening it in google sheets or that upstart other spreadsheet software – I think it’s called excel or something like that).

We hit our first snag when we were trying to figure out what types of things were in a particular column. Students suggested just printing that column but they saw that we’d still have to scroll through a million rows to really see what’s in there. They asked me to run the unique() function on that column but that still had 1000 items in it (still tough to scroll through and get a good sense of what’s going on). We settled on value_counts() to see the most popular items in that column, but then we hit another snag.

We couldn’t tell if the unique values in that column told the whole story. We were trying to see, for example, if a single row might have two things in that column, kind of like “car, truck” when describing accidents. Does car show up and then later truck in their own rows or if an accident involves both are they together in one row. Looking at the unique elements we saw “car” but we couldn’t be sure that “car, truck” might also be somewhere in that 1000 items. That’s when a student said “just use df.column.str.contains(“.*car.*”) and we’ll be in business!” Excellent, just what we’d learned in the last two weeks – a combo of pandas and regular expression jujitsu! But, alas, it didn’t work.

You see, I knew as soon as I saw the result of value_counts() that we were in trouble. I know all you pandas ninjas out there are laughing at me right now, but neither I nor my students knew how to filter the index values of a pandas series. Every suggestion they had got slapped down because it would only work on regular columns, not index columns.

I’m super happy to report that I wasn’t faking it. I literally didn’t know how to do it. However, and this I reported to the class, I knew it could be done. That’s one of my learning outcomes: I want my students to have confidence in what’s possible, even if they don’t know how to do it. So I asked them what they wanted to do. Did they want to google how to do it in pandas or follow another student’s suggestion and just do it straight in python using a loop. We had a great conversation about what they might put into a google search to help out and it was clear that they’d always add “pandas” to the search terms. We tried a few but then I had a brainstorm. I said “This will seem stupid and overkill, but I know it will work. Watch this.” And then I typed pd.series(df.column.value_counts().index.values).str.contains(“.*car.*”). Yep, I took the unique results from a pandas data frame and recast it as a new and different pandas series just so that the index column would be a normal column. Super overkill. But it worked. The students groaned and said it was likely a really dumb way to do it.

So we stopped to talk about it. And I think it was my favorite part of the day.

I said “come on, it works, who cares?” Some responded saying that it can’t possibly be the elegant solution that surely exists. They talked about how they hate it when they do something dumb like this and later learn a much better way. For me this is one of the key things about algorithmic thinking. Helping students see and discuss issues like this are what I love. Yep it was overkill. Yep it’s not elegant. But it works. Is that the end of the story? It doesn’t seem like my students think so.

By the way, after all that we learned that nothing like “car, truck” exists so we were in business! Next we wanted to get a visual on the data for the most popular item in that column. I’ll call it “cars” for now. Basically we wanted to know how another numeric column behaves for the rows about “cars”. We decided on a histogram and we were surprised by the result. Essentially it only had one bar way on the left and then a bunch of empty space all the way to the right. I reminded them that if it showed that much empty space it must be that there were some really small bars that just were hard to see. What the heck? That’s when I showed them that while df.column.hist() and plt.hist(df.column) both show the same graph, the latter also prints the raw data for the histogram bins. That’s super useful when you’re trying to see what’s going on with weird data. Sure enough the first bar had a count of 60,000 and the next 8 bars had counts of zero and the last bar had a count of 9 (I’d say 9! but that’s actually much bigger than 60,000).

Looking at the value of the bins one student shouted “typo!” meaning that those 9 must be due to data entry problems. They had good reason to say that (sorry, still not going to give away the details, see below). We did some quick calculations to see if there could possibly be 9 counts that far away from the rest of the data and we’re pretty convinced that they’re typos.

But now time was running out and we wanted to see much more detail from the 60,000. I said we could try to get rid of the 9 or just zoom in on the graph. It was interesting to see that no one had immediate ideas for doing either of those, though I’m sure they could see how to filter out the 9. Instead I just ran the histograms with 100 bins instead of 10 and that first bar split up a little. I again told them it was a dumb move but at least we knew there was some cool structure to the data.

Since time had effectively run out, I gave them a choice. I could either do the usual and go back to my office to make a screencast that finished the work, and give them all the proper syntax to use. Or I could do nothing and we could pick it back up on Monday, including the meta conversations. They really liked that, so I’m going for it.

Because of that choice, I am forcing myself to not dig into the data set. I know they want to eventually be able to put the data into a really cool visual that I don’t know how to do, but I’m making sure I don’t cheat and look all that up right now. It’s also why I’m not telling you, dear reader, the details of what we’re up to. If I did I’m afraid one of you would tell us all how to do what we want to do. But that would take the fun out of it!

Your thoughts

I’d love to hear your thoughts about this. Here are some starters for you:

  • I’m in this class and I really think the meta conversations helped me a lot. In particular . . .
  • I’m in this class and you keep describing our boring work as “interesting discussions.” Please stop.
  • I stopped reading when you said you weren’t going to give any details. This is just clickbait.
  • What do you have against Excel?
  • I think you should have just scrolled through the million rows. Surely their eyes would catch all the cool patterns right away!
  • I like this live coding. Were you worried that it would go off the rails?
  • I think if I did live coding I’d do a lot of practicing first. Did you do that?
  • What is your deal with that dumb factorial joke?
  • Do the students know when you shift to meta discussion? Is there a signal or are you explicit about it?
  • I think this was possibly a cool class but maybe their vote for more indicates their enjoyment instead of their learning. Are those decoupled in your class?
  • I’m in this class and my enjoyment and learning are the same!
  • I’m in this class and my enjoyment and learning are completely uncorrelated.
  • I can’t believe you think I’d drop everything and just do your work for you.
  • I can’t believe you’re not giving me the details. I want to do all your work for you.
Posted in programming, teaching | Tagged , | 3 Comments

Computational Data Science early semester thoughts

I’m back in the classroom! At least for a semester anyways. In the dean’s office I teach one class per year and last year was a fully online course, so this is a fun adventure (so far at least). This is just a post capturing some of the things that have happened that have piqued my interest. Here’s a quick (linked) list:

A quick description of the class: This is Introduction to Computational Data Science, a course that comes at the beginning of our new CDS major/minor program (though our python programming course is a prerequisite). The entire program aims to help students find and gather data, analyze it, and tell a story or make a decision with it. This course has them take their python skills and focus them on data. After this class students should be able to use tools like pandas, web scraping, and APIs to collect, analyze, and tell stories about data.

Grouping students with “Would you rather . . .” questions

I’ve used this before, but I’d forgotten how fun it can be. Nearly daily I’ll put students into work groups for things like brainstorming ideas for analyzing a particular Kaggle data set. There’s only 16 students in the class, but I know from experience that they don’t always get to know each other even in such a small class. Even though I’m using Canvas as the LMS for this course, I’ve decided to load up my old Rundquist-hates-blackboard-so-he-wrote-his-own-LMS LMS with the class roster so I could use my old group randomizer.

It shows all the students with checkboxes next to them. I check who’s present, then indicate the max group size I’m interested in. It randomizes them and ensures that no group has more than the max and no group has less than the max minus one. But the part that’s fun is that it also displays a randomized “Would you rather . . .” question from this site. I encourage the students to find their group, introduce themselves, and then answer the wyr question. Then I tell them what the group is supposed to do for the class.

I find it to be an easy way to build community, and it seems to be working pretty well.

Major projects (Twitter mining, web scraping, Digital footprint)

We have three major projects for this course.


Students will identify a topic they’d like to use twitter data to look at. It could be a hashtag, a topic, a famous user, whatever. They need to craft their research question and learn and use tools that will allow them to analyze hundreds of tweets and thousands of users.

Twitter is open data and they have a robust API that the python tweepy library is useful for accessing. The free development accounts have some data limitations but should provide plenty of data for my students.

Web Scraping

Students need to find a topic that has both a Kaggle data set and web pages that contain data that could extend the data set. Kaggle is great because it has data on tons of topics. So far in class we’ve explored the olympic medals data set pretty extensively. The reason we’re not stopping there is that for this program students need to know more than just how to deal with well-formatted data. Scraping data from web pages is a really useful skill in those times when such well-formatted data doesn’t exist. Of course it’s interesting that even the clean Kaggle data often needs some more work to refine the format, so it’s really nice to start with that. The examples we’ve done in class is to look at the most popular first names for winning olympic medals. First name is not a column in the data set so we had to learn how to extract it from the full name column. Pretty straightforward stuff, but it’s already been fun to brainstorm different things to do with the data. We’ll get to the web scraping part later in class.

Digital Footprint

This course satisfies the “Diversity” requirement of our general education requirements. It does so by having students look at their own digital footprint from a privilege perspective. They’re going to seek out their own digital presence and compare and contrast it with those who are different from themselves. At first I thought it might be interesting to compare with each other, but I was smartly warned off from that. Instead they’ll write a report about their research into themselves and other cultures/countries/etc.

Weekly work

When I was setting up the calendar for the course I was trying to think about the best ways to infuse the major projects. I settled on Fridays. I figured we’d spend Mondays and Wednesdays working on tools and skills and then find ways to apply them in class on Fridays. Of course the bulk of the work they should do on these projects will be outside of class, but I want to make sure I’m modeling some approaches they should be taking.

I think I’m happiest with the Mondays and Wednesdays right now, as the focus on tools and skills is pretty straight forward. Fridays can feel a little at loose ends but I’m still working on it.

Take this past week: On Monday I asked the students to brainstorm (in groups) things to search for on Kaggle. Each group came up with a suggestion and then we voted. Olympics won and we landed on a data set that lists all the medals won from 1896 to 2012. It lists the sport, the athlete, the year, the location, and the medal. It has 31,000 rows (which I immediately asked the students to gut check). They then worked in their groups to brainstorm interesting questions to ask of the data and by the end of the class we had a great set of questions.

On Wednesday we took that set of questions and voted on the top three. I sampled only 15 rows from the data set so that they could be viewed without scrolling and asked the three groups to manually do the task they were assigned to. The topics were:

  • Which first names have won the most medals?
  • What is the connection between length of name (character count) and medals won?
  • Which repeating initials have won medals?

You can see that these are all pretty similar, but each group had slightly different things to do.

You’ll see below that I modified my instructions a little in interesting ways, but ultimately the groups were able to think and talk about how to go from manually dealing with a small data set and getting a computer to do it on a larger scale.

Class came to a close and I asked them a question about the type of resources I should provide to help them out. Should I abstract the skills they were talking about and make some screencasts that show how to do those tasks in pandas/python or should I just go do the three projects for them? I warned them that to fully do the projects would involve some things they haven’t learned yet (namely regular expressions) but I had a suspicion that it might be more helpful to them since they’d already invested some time thinking about these problems. They voted for that and we had a brief discussion about how I’ve only ever really learned how to use software tools when I really wanted to get something done. I think I’ll keep that in mind when producing resources for them in the future.

Finally on Friday they worked in pairs to brainstorm their own webscraping/kaggle project and I did some live coding for them that went a little sideways.

Describing data analysis to a third grader

Above I talked about how I asked students in their groups to manually determine how to analyze a small set of data. Specifically I asked them to carefully determine what they the humans are doing and write down the steps. I warned them that depending on the verbs they chose it might be easy or hard to later translate those steps into an algorithm for a computer to do the work. Easy verbs include “read”, “scan”, and “count” and a hard verb example is “figure out.”

They got to work and I was meandering among the groups. I noticed that the “first name” group’s first instruction was “split the string at the comma.” There’s nothing wrong with that from an algorithmic perspective, but I was worried it might be too computer-centric for all the students in class (not all have actually taken the programming class because we are trying to find ways to grow the program).

That’s when I had a great idea. I encouraged them instead to write instructions that third graders would have to follow. I picked that age/grade quickly and seemingly randomly but I think it was a decent choice. We talked about making assumptions about what they’d know and realized that we could assume some things, like how a lastname, firstname list would likely be recognizable to third graders even though they likely almost never write their name that way. It also helped remove phrases like “split the string at the comma” from their instructions. The ultimate idea was to get the students to understand that the types of things they’re interested in can be often explained at a simple level, and then it’s their task to find out how to translate that for a computer. I think I’ll keep going with that approach with other similar skill development days in the future.

Video coding assignments

As I’ve done with so many of my physics classes, I’m grading students describing their work instead of their work product. I’m finding that in a coding class that’s super interesting. I see a lot of videos with code that look quite similar (I don’t mind at all if they work with each other or find code online) but I never get two identical submissions. The students walk me through their code and it provides me an opportunity to send vids back to them asking clarifying questions. I’m doing Standards-Based Grading with this class so that feedback process continues through the semester.

I really like hearing the students describe their code. You can tell what they came up with themselves versus what they found elsewhere. You can tell what they really understand and what they’re just copying by rote from other work. You can tell when they haven’t thought about a particular case of inputs and when they’re thinking about how to extend the code. You can also occasionally hear their joy when it works!

Quick SBG note: I’m using my one week rule (that I used to call my 2-week rule) where if they let standard sit for a week the score solidifies. I think that will work well in a skills-based class like this.

Coding for themselves and not others

There’s one aspect of the text we’re using that I really don’t like. It’s constantly asking students to use input() and print() commands when doing things. At first I thought I just didn’t like it because I like jupyter’s notebook approach better (just do myfunction(4) instead of input(“hello dear user please input an integer”)) but I realized there’s something more subtle: For data analysis coding you’re often coding just for yourself. That’s one of the big things that distinguishes this class from the previous programming course. There you might be learning how to write code for others to interact with. In this class you’re using a tool to solve a problem. Often for yourself. Your audience comes in later when you give them a report.

Also, you do work with others during the coding, but nearly always that means you’re writing code that they’re going to (re)use. Hence a function that returns a list is almost always going to be more useful than a function that has a loop with a print statement in it.

I’ll be curious to hear your thoughts on this.

Google learning outcomes

On Friday we had a really interesting conversation about things I want students to take away from this class. I had just finished what I knew was going to be an unfulfilling (for all of us) live python coding session to show them how to investigate a kaggle data set about deaths from disease. I screwed up the syntax and ended up using the wrong functions a bunch of times. Once I was trying to add up deaths due to cancer and ended up just counting how many years were in the data set. Yep, super wrong, but it lead to this cool conversation:

I pointed out that a really important thing in a class like this is to realize that you can’t possibly memorize all the various python/pandas/etc commands we’ll be learning. They’ll need to figure out what system they’ll use to ensure that they can always figure that sort of thing out. I gave the example of keeping good notes somewhere but then admitted that I just don’t really do that myself. I asked what they thought I did instead and they nailed it: trust google.

What they (and I) mean by that is that you can quite easily find good syntax help by just doing something like googling “pandas sum column filter”.

But then I told them about another major thing I need them to take away from the class: confidence that things are possible. When I reflect on times when I’ve done a crappy job of teaching a software tool to someone, it’s when I fail to get the person to buy into that mantra: it’s possible! I think a class like this can succeed if it puts students in situations where they don’t know how to do something but they develop confidence that they can figure it out. This is so similar to physics teaching that I’m feeling dumb for not articulating that much earlier in my career. In physics we have tools (conservation of momentum, conservation of that human-invented, not strictly necessary idea – energy, etc) and we want students to think of how they can put those to use in solving problems. In coding, we have tools (libraries, software, apis etc) and we want students to think of how they can put those to use in solving problems!

Your thoughts?

Ok, that was a lot but I’m happy I got it down. It’ll help me as I continue to reflect on how to improve this class. Any thoughts/comments/questions? Here are some starters for you:

  • I’m in this class and I think it’s going pretty well. Here’s what has helped me . . .
  • I’m in this class and I honestly think it’s crap. Here’s why . . .
  • Why in the world did you let students in who haven’t taken a programming class?
  • My 3rd grader writes their name lastname, firstname all the time now thanks to this dumb post.
  • I think the one week rule is dumb and here’s why . . .
  • Another super long boring post from you. But at least you figured out how to do anchors so I could jump around – thanks!
  • I think all deans should teach at least one class, that’s a great idea!
  • I don’t think deans should be allowed near a classroom. This is dumb.
  • Why are you giving Canvas a chance?
  • I think if you want 3 person groups and you have 16 people you should have 5 3-person groups and a random person who’s screwed.
  • That “would you rather” site has some weird ones. Do you use them?
  • Because you sometimes capitalized Kaggle and sometimes not I assume you mean that there are two sites with similar capabilities. You only linked to one, though, so I’m totally confused
  • I don’t understand how this class can satisfy a diversity requirement. Can you say more?
  • What do you mean by “gut checking” data?
  • Here are some more hard verb examples for you . . .
  • What do you mean when you say energy isn’t strictly necessary?
Posted in sbar, sbg, screencasting, teaching, technology | 9 Comments

One die to rule them all

For a number of years I’ve been working on finding ways to turn what looks like an unfair die to a fair one (see these posts). Recently I’ve made a lot of progress. This post shows how I’ve turned a 36-sided unfair die into a fair 3-, 4-, 5-, 6-, 7-, 8-, 9-, or 11-sided die.

The two big accomplishments since the last post were to 1) roll an unfair 36-sided die 100,000 times, and 2) figure out how to optimize contiguous groupings of sides to make the fair dice I alluded to above.

The original die

I’m not quite sure why I decided to settle on a 36 sided die, but I did I guess. I knew that each roll would take a few seconds and I knew I needed some good statistics. I have access to a VDI machine at work that can run all night long so I finally got around to leveraging that and getting some decent statistics on the rolls for the die (100,000 rolls that took 5 days of continuous calculations!):

Comparison of actual rolls (red dots), area of sides (orange line), solid angle that side subtends (green), and volume of the die that the side projected to the center adds up to (blue). Note how the y-axis doesn’t go down to zero.

A number of interesting things are represented in that graph. First, you can tell that the die is unfair because the red dots aren’t flat. The minimum probability is ~0.021 and the max is 0.035, or 67% bigger. The three other curves are all pretty similar, and certainly all 4 curves are correlated. But it’s interesting that in my conversations with folks over the last few years I’ve run into people (and web sites) that would claim that side area, side solid angle, or side volume (really the volume of the chunk from the center projected out to a side which is also proportional to the mass of that chunk) should accurately predict the probability. It’s interesting that none do!

Evolving dice

Ok, next came the challenge of finding contiguous groupings of sides that would yield (nearly) identical probabilities. I thought I had figured out how to find random groupings as shown in this image for a random die I used in my last post:

9 different ways to break up the same random die into contiguous regions

The trick to do that was the Mathematatica command “FindGraphPartition” where the graph in question has the faces as the nodes and connections exist between faces that touch. That command finds regions that are connected, trying to keep regions with strong connections together. It does that by looking at the edge weight between them (higher number means they’re “more connected”). So I just fed that function the graph of my polyhedron with random (positive) numbers for the edge weights (for a 36-sided die there are 54 edges).

So I could run that random weighting over and over again to try to find regions that just happened to be fair. This is hit-or-miss, of course, so I thought I’d try to make a genetic algorithm work.

A genetic algorithm, or really any of the evolutionary programming types, work really well when you have a huge parameter space (54 different parameters, in this case, that can each be any real positive number) and lots of potential local minima. What I wanted to do was to use the 54 parameters as continuous variables and to let the genetic algorithm test random “parents”, rank them, throw away the bottom half of the population, and then repopulate that bottom half with “children” made from the remaining parents. I pair two parents up, take the first, say, 20 parameters from one and the last 34 parameters from the other and vice versa to make two kids. Then I “mutate” some of the kids “genes” by adding a random number to one or more of the parameters. Then I run them through the fitness test and the next generation repeats.

In this case the fitness test I used was the max probability (of one of the contiguous groupings) minus the min probability. I got the probabilities by adding the side probabilities involved in each contiguous region. Those I got from the 100,000 rolls that I did earlier. If the max-min goes to zero, then all the probabilities are equal and I’ve got a perfectly fair die. That never seems to happen, but after 1000 generations I tended to get decent results

Before talking about what I mean by “decent” here’s a pic of my fairest 5 sided die followed by a panorama of all my dice so far (again, they’re all made up of the same 36-sided die, just with different groupings of sides painted)

5 “sided” die made from a 36-sided unfair die
top row: 3-, 4-, 5-, and 6-sided fair dice
bottom row: 7-, 8-, 9-, and 11-sided fair dice
For all the die is shown on the bottom and the sides are broken out.

What is “decent” or close-enough to fair?

Since none of the genetic algorithm runs ever ended with a fitness of zero, none of the dice in the image above are strictly fair. But what’s fair enough? My kids and I decided that no one would really notice if after a few hundred rolls it seemed like the sides were roughly fair. That can be quantized a lot better, of course, but that’s the gist of what I did here.

Lets say you rolled a fair 7-sided die a bunch of times. What would the histogram of the rolls look like? If you rolled it an infinite number of times every side would come up 1/7th of the time. But you’re not going to roll it that often. If you only roll it 100 times, you might expect each side to be rolled 14 times with two of them being rolled 15 times. But you don’t usually find that. Instead you get more variability that you expect (or at least than some of us would expect). Counting statistics (or Poisson distributions if you like) would suggest that the typical variation after 100 rolls for each side would be the square root of 100 divided by 7 or roughly 3.8. In other words, most of the time the sides would be off from the expected 14 by 3 or 4 (either high or low – obviously the sum of the sides would be 100 still).

Ok, so what if you are suspicious that it’s not fair? Well, you can roll it a bunch of times and check the result against what the fair statistics would suggest. If you do it 100 times and you get a bunch of results within 3 or 4 of 14 you’d have to admit that it still seems fair. Of course if you’re patient you could roll it 1000 times. Then you’d expect each die to roll 142 or 143 with a typical spread of 12. Don’t freak out that 12 is bigger than 3 or 4. What really matters is the relative spread. 12 divided by 142 is smaller than 3 or 4 divided by 14.

So what I did was look to see how many rolls you’d have to roll my not-quite-fair dice to see results that start to look suspicious. The very worst of my dice would need over 500 rolls for that to happen. I guess I think that’s “good enough.”

Of course there are much more formal ways to do this. Mathematica provides DistributionFitTest for just this purpose. You can use it by providing a set of rolls of a die and ask the chance that the rolls came from a perfect die. It returns a p-value that can be interpreted as exactly that chance. Of course every 1000 rolls is different, so if you rerun the command with a different set of rolls you get a different p-value. That’s why what I’ve got below are histograms for each die where I rolled 1000 rolls 1000 times each. The x-axis is the p-value it found and the y-axis is the probability.

p-value histograms for each die with 1000 rolls

Note that doing the same thing with perfect rolls yield a graph similar to the 7-sided curve. Really the only one that’s not great on this scale is the 11-sided die. Note also that if I do this for an un-optimized contiguous side set you nearly always get a p-value less that 5%. This shows that I really needed that genetic algorithm.

Next steps

I’d love to 3D print my 36-sided die a few times and paint the sides. I bet I’d find some D&D folks interested in buying them.

I’d also like to figure out why I can’t do 10-sided. Mathematica crashes every time I run the genetic algorithm. I really can’t figure out what’s going on.

I’d love to figure out if you could predict the side probabilities from some physical measure of the die. Obviously side area, side volume, and side solid angle don’t do it, but maybe some other measure does. One thing I’ll try checking is looking at the effective depth of the potential energy dip for each side. What I mean is: to topple from one side to a neighbor, you have to make the die go up on an edge. This has a gravitational potential energy cost. Each side is a triangle and so you could get three measures of that for each side. Wouldn’t it be cool if the actual rolling probabilities tracked with that measure! Then I wouldn’t have to spend a week calculating those probabilities and I could really do some fun stuff.

Your thoughts?

Here are some starters for you

  • I think this is really cool. What I especially liked was . . .
  • This is really dumb. Here’s why . . .
  • Why do you call it a “fitness function” when you’re clearly minimizing. Try calling it a cost function, jerk.
  • I ran FindGraphPartition and occasionally got partitions that aren’t contiguous! What sort of magic did you do to fix that? (answer: updated the fitness function to make sure those got huge costs – used ConnectedGraphQ on a graph subset)
  • If you can’t make perfect dice, I’d never pay for them. You should say that more clearly at the top of this dumb post
  • Why do you bother with contiguous groupings? Couldn’t you just use random groupings and print them with the appropriate number in each triangle? (answer: I do get better p-value results this way, but my kids think contiguous is cooler)
  • I think you’ve succumbed to p-value abuse. You clearly run it over and over again until you get what you want. Hence the histograms.
  • I think you idea about the potential energy measure of a side has merit. Here’s what I’d do . . .
  • I think I know a different physical measure that will predict the probabilities. Here’s what it is . . .
  • There is no physical measure that will work. You’ve got to really roll them. I think you should pick a random shape, 3D print it, roll it 100,000 times yourself, look at the probabilities, then make a tiny change and repeat.
  • I don’t think a genetic algorithm was the best choice here. Instead I would . . .
  • All the Mathematica advertising gets old. I’m pretty sure I could do all this with my TI-84.
Posted in fun, mathematica, physics, research | 4 Comments

Lifelong computational skills

I’m frantically putting together my syllabus for our brand new Computational Data Science intro course (this comes after a programming course) and I realized that I’m not using one of my favorite syllabus planning tools: this blog!

This course was proposed and is listed in the bulletin thusly:

Title: CDS 1020 Introduction to Computational Data Science

Goals: To continue the study of computational techniques using Python, with an emphasis on applications in data science and analysis.

Content: This is a continuation of CDS 1010, applying algorithmic thinking to applications in data analysis. Topics include data mining, data visualization, web-scraping.

Prerequisite: CDS 1010

I’m really excited to teach this course, especially as it’s been a year and a half since teaching an in-person class (I teach one class per year in the dean’s office and last year I taught a fully-online course). However, I’m feeling the pressure to make sure this is a strong course and I have some things I’m grappling with right now. This post is trying to put my thoughts and questions down around the idea of skills/approaches/ways of thinking that I want my students to really own after this class.

Cool now versus later

As I’m looking at all kinds of cool ways to show students the power of computational approaches to data collection, analysis, communication, and use in decision making or story telling, I’m trying to think about what it takes for students to use those tools beyond my class. For example, while there’s lots of tutorials (like this cool one about Natural Language Processing in python) I’m sure I can help my students get running while they’re in the course with me, I’m not sure they’ll feel like they could really use that tool without my scaffolding handy. Instead, possibly I should focus on skills or approaches that would better empower my students, even if they’re not as powerful.

The nltk library for python is very powerful and doing some work with it during my course would likely cause students to appreciate its power as it helps them do cool projects. However, I’m nervous that the learning curve associated with it might make them not want to reuse it on their own time after my course. Of course the students that really dive in to our new Computational Data Science major might, but I’m not sure they’re my target audience for this first time through.

In my physics teaching, this reminds me of our work trying to get students to do automated data collection and experimental control using first LabVIEW (yes, that’s how it’s spelled) and later arduino. I was a huge LabVIEW user all through grad school and we had a mini-site license when I first started here. In our Modern Physics lab we taught the students how to use it and got them to do some interesting things in that course. However, we started to notice that students were not reaching for that particular tool the next year in our Advanced Lab. In that lab they design and execute their own team-based year-long projects, often based on ideas they’d find in the American Journal of Physics. We would hear things like “oh we’ll just manually record that data because it’s too hard to get the LabVIEW stuff working” or “I don’t remember how to install all the right things to get LabVIEW working so we’re not going to bother.” Later we switched the modern physics lab over to arduino, in the process reducing the complexity of the things they were interfacing with. Suddenly nearly all the projects in Advanced Lab were at least brainstorming ways they could get the arduino ecosystem to help them. So my lesson from that was that a slightly inferior tool set that has less logistical on ramps led to students using it more in the places we were hoping for.

Types of things I’m considering

Here’s a short list of the types of things I’m talking about and that I’m trying to make decisions about:

Tough on-rampEasy on-ramp
nltkregex (possibly starting with simple spreadsheet commands)
twitter apicopying and pasting from twitter search (and then some sort of analysis)
list comprehensionsfor loops
setting up local databases and using python to manage and analyzeUsing simple spreadsheets and perhaps google apps script to manage and analyze
Things I’m thinking about

Certainly I would choose the tough on-ramps if I knew for sure my students would be majors and would have someone like me around to both help them use the tools and cajole them to consider them when they’re doing complex projects.

For students who might not be majors and who I would hope would use computational approaches to decision making and story telling in the future, I might choose the easier on-ramps, even though in nearly every case above it’ll be limiting.

My guess is I’ll oscillate between those columns as the semester goes along.

Your thoughts? Here’s some starters for you:

  • Glad to have you back in the blog-o-sphere, where have you been?
  • This sounds like a fun class, can I sit in?
  • This sounds like a dumb class, can I lobby to have it cancelled?
  • I like the _____ on ramp things and here’s why . . .
  • The LabVIEW/arduino example is great, here’s a similar example from my work . . .
  • The LabVIEW/arduino exaple is dumb and doesn’t apply to this at all, here’s why . . .
  • As usual you couldn’t even bother to google some clear answers to these problems. Here’s several articles you should have read before even writing this drivel: . . .
  • Here’s some things I’d add to your table . . .
  • Wait, are you going to actually teach a python class? No Mathematica? I don’t believe it.
Posted in arduino, programming, syllabus creation, teaching | 8 Comments

Virtual Physics Conference

I’m part of a grant team right now brainstorming a new project, and a part of it is potentially hosting a conference. We kicked around some ideas about it, and as usual in situations like this, we casually talked about what a virtual conference might look like. That got my brain going so I thought I’d get some thoughts down here.

My goal: A virtual conference for physics teachers to be held potentially in the summer of 2020.

Whenever I’m a part of conversations like these, the typical pros and cons list look like this:

  • Pros
    • Cheap (I almost stopped this list here)
    • Flexible
    • Comfortable
    • Wider reaching
  • Cons
    • Not as immersive
    • Missing “hallway conversations”
    • Less connections
    • Less commitment from participants

I’ve been thinking about all of those and I think I’ve thought of at least a beginning of a plan that address all of them. Certainly the pros will still be there, but hopefully it’ll be an experiment worth doing if we can address the cons at least to some degree.


I’ve used a ton of different technology for doing meetings like these. Back in the glory days of the Global Physics Department we used both Elluminate Live and later Blackboard Collaborate (really the same software, just bought out by Blackboard). Since then I’ve used WebEx, Google Hangouts, and Zoom a ton and I’ve occasionally used others as well. For this experiment, I would mostly want a reliable technology, and the one that I’ve had the most luck with there is Zoom. But below I’ll lay out what I think the needs would be.

Participants at a minimum would need a computer/phone with decent internet speed and speakers. A microphone would be great and a camera would be too, but I think I’d be open to where we’d draw the “necessary” bar.

Speakers would need audio and video and screen sharing capability. It’s possible we could ramp up to something like dual monitors or something but I’m definitely open to suggestions.

Rough outline

My vision is something like this:

  • Parallel sessions
  • ~5 speakers per session
  • 4 sessions blocks in a day
  • A single day


This is the toughest nut to crack, I think. The longest online conferences I’ve been in were 8 hours long and it was hard to stay focused. So what would it take to get people to stick?

Taking the outline elements from above: Parallel sessions allows people some choice. Certainly at in-person conferences people really appreciate that, especially when a session doesn’t have what you’d thought it was going to have. ~5 speakers per session makes it seem like you could potentially hold all that info in your head at a time and really have a great conversation going. Four session blocks in a day just seems reasonable and one day is a great start for this experiment, at least I think that’s true.

Addressing issues like “my favorite part of conferences are the impromptu conversations that happen between sessions” is something I’ve been thinking about a lot. I think it would be great if we had technology that allowed the following:

  • Every session has a Zoom room (I’ll just use zoom vocabulary here to simplify) with a main speaker at any given time but a running commentary that people can participate in.
  • Questions will be submitted and voted on during each talk so that speakers can answer them in a crowd-prioritized way.
  • Discussion will use software like my “my-turn-now” software that allows for equitable discussions.
  • [This one I don’t know about existing solutions] This one is what I’ve been thinking would help the most with some of the cons above. I call it “hallway conversations.” I want any two-or-more groups to be able to spontaneously spawn a new zoom room. They would get video conferencing, a chat board, and a white board. They could welcome anyone else in who “knocks” and they could choose to be either public “Andy and Super-Cool-Person’s room” or private.
  • Drop in rooms for common topics
  • You’d get a personal recap record of every room you were in along with whatever contact info people in that room were willing to share. You’d also get a chat transcript and any whiteboards.

Imagine sitting in your pajamas with a beer and seeing that people you are excited to meet are in a public room. You knock and they let you in! You then can meet them and either hang at the periphery to just listen or jump right in. Kind of sounds like an in-person conference, doesn’t it? The originators could leave and the room would still exist until there’s not at least two people in it. The personal recap record would really help you maintain any contacts you’ve developed.

My other big idea is meals, specifically lunch. I envision partnering with something like Door Dash to get everyone a meal at the same time. They’d pick their meal at registration (possibly even same day, I suppose) and then it would be delivered to everyone at the same time (yes, I know, there’d be some time zone problems but I think it might be cool enough to convince west coast people to eat at 10). There’d be Zoom rooms for every type of food. You’d be in a video conference with anyone else eating “at the same restaurant” and you could hopefully be involved in some fun conversations (and of course you could still launch a “hallway conversation” if you wanted to).


This couldn’t be free, as the Zoom cost won’t be zero. But it would surely be cheaper than gas/plane + hotel that a normal conference would have. If we had 5 parallel sessions and 5 speakers in each session and 4 session blocks that’s 100 people. If we charged $100 per person that would be $10,000 which might be enough for the Zoom ideas above. I plan to research this a lot more.


A collaborator of mine shared this white paper from the University of California system that talks about an approach to virtual conferences that sounds a lot like a flipped conference. Speakers record their talks ahead of time and each talk has a discussion board associated with it. I think that’s a cool idea, but I’ve always been unable to get my cognitive energy focused like that ahead of a meeting. The plan above allows you to come in cold (with the exception of your own talk of course) and just let it flow over you dynamically. I’m curious what others think, though.

Your thoughts?

So that’s where I’m at with my brainstorming. Your thoughts? Here are some starters for you:

  • I love this idea, where can I sign up? I just had a couple of thoughts to make it better . . .
  • Um, ever heard of google? This exists and is called . . .
  • If I can’t shake someone’s hand I don’t think it’s a real relationship. How are you going to do that?
  • Love the “hallway conversations” but I think you’d also have to think about . . .
  • $100?! Way too _____. Instead you should . . .
  • I would love to facilitate a session. Can I shoot you some ideas? Who’s on the committee?
  • Could we do a poster session too? I have some ideas about how that could work
  • Door Dash exploits their delivery people. Instead think about partnering with . . .
  • Here’s an interesting way to mix your ideas with the flipped conference ideas . . .
Posted in community, glodal physics department, teaching, technology | 11 Comments