frantically putting together my syllabus for our brand new Computational Data Science intro course (this comes after a programming course) and I realized that I’m not using one of my favorite syllabus planning tools: this blog!
This course was proposed and is listed in the bulletin thusly:
Title: CDS 1020 Introduction to Computational Data Science
Goals: To continue the study of computational techniques using Python, with an emphasis on applications in data science and analysis.
Content: This is a continuation of CDS 1010, applying algorithmic thinking to applications in data analysis. Topics include data mining, data visualization, web-scraping.
Prerequisite: CDS 1010
I’m really excited to teach this course, especially as it’s been a year and a half since teaching an in-person class (I teach one class per year in the dean’s office and last year I taught a fully-online course). However, I’m feeling the pressure to make sure this is a strong course and I have some things I’m grappling with right now. This post is trying to put my thoughts and questions down around the idea of skills/approaches/ways of thinking that I want my students to really own after this class.
Cool now versus later
As I’m looking at all kinds of cool ways to show students the power of computational approaches to data collection, analysis, communication, and use in decision making or story telling, I’m trying to think about what it takes for students to use those tools beyond my class. For example, while there’s lots of tutorials (like this cool one about Natural Language Processing in python) I’m sure I can help my students get running while they’re in the course with me, I’m not sure they’ll feel like they could really use that tool without my scaffolding handy. Instead, possibly I should focus on skills or approaches that would better empower my students, even if they’re not as powerful.
The nltk library for python is very powerful and doing some work with it during my course would likely cause students to appreciate its power as it helps them do cool projects. However, I’m nervous that the learning curve associated with it might make them not want to reuse it on their own time after my course. Of course the students that really dive in to our new Computational Data Science major might, but I’m not sure they’re my target audience for this first time through.
In my physics teaching, this reminds me of our work trying to get students to do automated data collection and experimental control using first LabVIEW (yes, that’s how it’s spelled) and later arduino. I was a huge LabVIEW user all through grad school and we had a mini-site license when I first started here. In our Modern Physics lab we taught the students how to use it and got them to do some interesting things in that course. However, we started to notice that students were not reaching for that particular tool the next year in our Advanced Lab. In that lab they design and execute their own team-based year-long projects, often based on ideas they’d find in the American Journal of Physics. We would hear things like “oh we’ll just manually record that data because it’s too hard to get the LabVIEW stuff working” or “I don’t remember how to install all the right things to get LabVIEW working so we’re not going to bother.” Later we switched the modern physics lab over to arduino, in the process reducing the complexity of the things they were interfacing with. Suddenly nearly all the projects in Advanced Lab were at least brainstorming ways they could get the arduino ecosystem to help them. So my lesson from that was that a slightly inferior tool set that has less logistical on ramps led to students using it more in the places we were hoping for.
Types of things I’m considering
Here’s a short list of the types of things I’m talking about and that I’m trying to make decisions about:
|Tough on-ramp||Easy on-ramp|
|nltk||regex (possibly starting with simple spreadsheet commands)|
|twitter api||copying and pasting from twitter search (and then some sort of analysis)|
|list comprehensions||for loops|
|setting up local databases and using python to manage and analyze||Using simple spreadsheets and perhaps google apps script to manage and analyze|
Certainly I would choose the tough on-ramps if I knew for sure my students would be majors and would have someone like me around to both help them use the tools and cajole them to consider them when they’re doing complex projects.
For students who might not be majors and who I would hope would use computational approaches to decision making and story telling in the future, I might choose the easier on-ramps, even though in nearly every case above it’ll be limiting.
My guess is I’ll oscillate between those columns as the semester goes along.
Your thoughts? Here’s some starters for you:
- Glad to have you back in the blog-o-sphere, where have you been?
- This sounds like a fun class, can I sit in?
- This sounds like a dumb class, can I lobby to have it cancelled?
- I like the _____ on ramp things and here’s why . . .
- The LabVIEW/arduino example is great, here’s a similar example from my work . . .
- The LabVIEW/arduino exaple is dumb and doesn’t apply to this at all, here’s why . . .
- As usual you couldn’t even bother to google some clear answers to these problems. Here’s several articles you should have read before even writing this drivel: . . .
- Here’s some things I’d add to your table . . .
- Wait, are you going to actually teach a python class? No Mathematica? I don’t believe it.