New courses on web scraping and engineering complex marketing data sets (starting Feb. 2021)

I’m excited to develop and teach two new courses in the Marketing Analytics program at Tilburg University, starting February, 2021.

  • For Online Data Collection and Management (oDCM), I have compiled my 10+ year experience of using web scraping and APIs into a hands-on skills course that helps students to not only collect online data (that is offered by many schools, actually), but also how to manage online data collections. Think about design principles of web scraping (“live vs. historical scraper?”), deployment (locally or remotely in the cloud), and disclosing data for public re-use (for this, e.g., see my data sharing initiatives). Throughout the course, students use Python and Jupyter Notebooks. Head over to the course’s public website (https://odcm.hannesdatta.com) to stay up-to-date on enrollment and course content.

  • For Data Preparation and Workflow Management (dPrep), I’ve thought carefully about the decisions researchers have to make when engineering data sets for statistical analysis. Many students and researchers perceive the process of “creating” a data set for analysis as rather simplistic: a bit of cleaning here, a bit of merging there, and you’re done. In this course, I take data preparation to the next level, by considering highly complex data preparation workflows (think multiple sources, structured and unstructured data, data from databases and data from files, multiple delivery batches, lots of missing data, different file versions, etc.). To enable students to engineer datasets from complex raw data, I’ll be using workflow principles of reproducible science that are documented at Tilburg Science Hub. If you have the feeling you’ll be working with lots of raw data that you need to cast in a data set for analysis, or simply are interested in making your work reproducible, this course is for you! We’ll use R and command line scripting throughout, by the way. The course website is available at https://dprep.hannesdatta.com. Check it out for enrollment options!

By the way: I am developing both courses open-source. That means that all of the course material is on GitHub (oDCM, dPrep). You’re invited to join the development team, or simply use the courses at your own institution. Drop me a mail if you do - happy to hear about it!