I’m frantically putting together my syllabus for our brand new Computational Data Science intro course (this comes after a programming course) and I realized that I’m not using one of my favorite syllabus planning tools: this blog!
This course was proposed and is listed in the bulletin thusly:
Title: CDS 1020 Introduction to Computational Data Science
Goals: To continue the study of computational techniques using Python, with an emphasis on applications in data science and analysis.
Content: This is a continuation of CDS 1010, applying algorithmic thinking to applications in data analysis. Topics include data mining, data visualization, web-scraping.
Prerequisite: CDS 1010
I’m really excited to teach this course, especially as it’s been a year and a half since teaching an in-person class (I teach one class per year in the dean’s office and last year I taught a fully-online course). However, I’m feeling the pressure to make sure this is a strong course and I have some things I’m grappling with right now. This post is trying to put my thoughts and questions down around the idea of skills/approaches/ways of thinking that I want my students to really own after this class.
Cool now versus later
As I’m looking at all kinds of cool ways to show students the power of computational approaches to data collection, analysis, communication, and use in decision making or story telling, I’m trying to think about what it takes for students to use those tools beyond my class. For example, while there’s lots of tutorials (like this cool one about Natural Language Processing in python) I’m sure I can help my students get running while they’re in the course with me, I’m not sure they’ll feel like they could really use that tool without my scaffolding handy. Instead, possibly I should focus on skills or approaches that would better empower my students, even if they’re not as powerful.
The nltk library for python is very powerful and doing some work with it during my course would likely cause students to appreciate its power as it helps them do cool projects. However, I’m nervous that the learning curve associated with it might make them not want to reuse it on their own time after my course. Of course the students that really dive in to our new Computational Data Science major might, but I’m not sure they’re my target audience for this first time through.
In my physics teaching, this reminds me of our work trying to get students to do automated data collection and experimental control using first LabVIEW (yes, that’s how it’s spelled) and later arduino. I was a huge LabVIEW user all through grad school and we had a mini-site license when I first started here. In our Modern Physics lab we taught the students how to use it and got them to do some interesting things in that course. However, we started to notice that students were not reaching for that particular tool the next year in our Advanced Lab. In that lab they design and execute their own team-based year-long projects, often based on ideas they’d find in the American Journal of Physics. We would hear things like “oh we’ll just manually record that data because it’s too hard to get the LabVIEW stuff working” or “I don’t remember how to install all the right things to get LabVIEW working so we’re not going to bother.” Later we switched the modern physics lab over to arduino, in the process reducing the complexity of the things they were interfacing with. Suddenly nearly all the projects in Advanced Lab were at least brainstorming ways they could get the arduino ecosystem to help them. So my lesson from that was that a slightly inferior tool set that has less logistical on ramps led to students using it more in the places we were hoping for.
Types of things I’m considering
Here’s a short list of the types of things I’m talking about and that I’m trying to make decisions about:
Tough on-ramp | Easy on-ramp |
nltk | regex (possibly starting with simple spreadsheet commands) |
twitter api | copying and pasting from twitter search (and then some sort of analysis) |
list comprehensions | for loops |
setting up local databases and using python to manage and analyze | Using simple spreadsheets and perhaps google apps script to manage and analyze |
Certainly I would choose the tough on-ramps if I knew for sure my students would be majors and would have someone like me around to both help them use the tools and cajole them to consider them when they’re doing complex projects.
For students who might not be majors and who I would hope would use computational approaches to decision making and story telling in the future, I might choose the easier on-ramps, even though in nearly every case above it’ll be limiting.
My guess is I’ll oscillate between those columns as the semester goes along.
Your thoughts? Here’s some starters for you:
- Glad to have you back in the blog-o-sphere, where have you been?
- This sounds like a fun class, can I sit in?
- This sounds like a dumb class, can I lobby to have it cancelled?
- I like the _____ on ramp things and here’s why . . .
- The LabVIEW/arduino example is great, here’s a similar example from my work . . .
- The LabVIEW/arduino exaple is dumb and doesn’t apply to this at all, here’s why . . .
- As usual you couldn’t even bother to google some clear answers to these problems. Here’s several articles you should have read before even writing this drivel: . . .
- Here’s some things I’d add to your table . . .
- Wait, are you going to actually teach a python class? No Mathematica? I don’t believe it.
Hey Andy – here’s a topic I’m quite involved in nowadays with my research! A couple years ago I started teaching myself Python and now over 50% Python/Matlab ratio.
Overall: It really helps me to have some stuff pre-made, or to walk through it together in class, and have accessible, well organized directory. That way, when I’m home later thinking “How the heck did we do that?” I have a working tool to poke and prod. It would be really cool to walk the students through creation of a virtual environment (say, with Conda) to accomplish a basic task (say, load a Twitter dataset). They will have to remember how to fire up the venv or condaenv every time they need to do their homework. I’ve also inadvertently improved my linux skills this way.
Thoughts on coding tools: An instructor recently commented, on the first day of class, “We all just assume you all know how to use jupyter notebooks and Github these days.” … The interest both from enrolling students & employers for these skills is booming. Plus, it’s a fun, interactive way to learn the basics of coding where you can see your results in real-time.
Things I use all the time: Importing packages. Reading data – txt and npy, maybe binary or csv. Numpy. For loops. If statements. Matplotlib plotting. Preallocating matrices and datatype (don’t forget to preallocate a complex covariance matrix!), matrix indexing. (Keras + Tensorflow and Scikit-learn for machine learning). Those truly summarize at least 90% of my codes.
Good luck & I’m excited Hamline is continuing to offer new courses! -Emma
Thanks for the comment, Emma, I really appreciate it. We’ve actually kicked around whether to do some github stuff in this class. For the moment we’ve decided against it, but you have me thinking about it again. Hope you’re well!
Thanks Andy, yes I’m well and wishing the same for you!
As a credit to your teaching, the only coding I did at Hamline was Mathematica and Arduino, and from those I was able to pick up everything else. Github is unquestionably a hard on ramp… it sounds like a great class and fun :).
First of all, my thought is – cool class!! Sounds like fun, no matter what.
I love that you’ve thought of easy and more challenging on-ramps. Is it possible to have multiple tracks that class can follow depending on their comfort level? I’m thinking of the cases where half the students have no experience with the most basic ideas and the other half are already somewhat comfortable with the harder stuff.
I think that’s definitely a possibility. There’s a bunch of stuff that google apps script can do that is likely more powerful in python. I wonder if I could use the former in class and encourage those that want to to use python.
“The LabVIEW/arduino example is great, here’s a similar example from my work . . .”
We have a Mathematica (or Sage) requirement in Calculus II (Tough on-ramp). I am starting to really feel that it is better that they use WolframAlpha regularly (Easy on-ramp) than use Mathematica for a couple of assignments (that they fake their way through) and then never use it again (unless they happen to have a professor later on who uses it, which is unlikely). I think that I am strongly in the camp of Easy on-ramp UNLESS you are going to be using the tool for a long time (near daily and/or multiple courses).
That said, I am using Anaconda in my stats class this year, which is a Tough on-ramp. However, I created templates for the students so that they basically just need to click “shift-enter” and change a couple of variables, so my goal is really just exposure. But perhaps this indicates some hypocrisy on my part, too.
I’m not sure I agree with where you’re putting the bar (“near daily and/or multiple courses”) but I’m somewhere in that vicinity.
As for the “shift-enter” approach, I do that when I want them to learn something about what’s being calculated but not so much when I want them to master that tool. I would guess you’re in the same boat. It makes me think about the twitter API. It turns out it’s (nearly) trivial to explore the twitter api in Mathematica, but that’s assuming running Mathematica isn’t an on-ramp.
The LabVIEW/arduino example is great, here’s a similar example from my work:
I am using Arduino/Python in my advanced lab course for the same reasons you give about the LabVIEW issues. The students all have had either Java or C++ (starting this year the CS major added a python option we are referring our physics majors to.), so programming with the Arduino IDE is easy for them. They catch on to python quickly. I give them six jupyter notebook of “Skills” to complete on their own. I avoid some python constructs like list comprehensions, lambda functions, and writing generator functions. I DO fully use numpy arrays with slicing.
BTW, where have you been?