6  Project Planning Document

Published

October 30, 2025

Complete by: Thursday 30 Oct. at class time

6.1 Description

You’ve already learned a lot about how to work with data, and this assignment is designed to get you started thinking about the final project for this class, which will bring all these skills together.

Don’t worry if you’re still getting the hang of statistical modeling: while that’s part of the final project, exploratory data analysis is also a big important part of the assignment, and you already know all about that!

At this stage of the project, you’ll (1) choose a dataset to work on, (2) craft some research questions around that data, (3) conduct some secondary research on your data, (4) consider the ethics of your project, and (5) begin to plan out your analysis. Let’s begin by learning a little more about the final project:

6.1.1 The Final Project

The purpose of the final project is to combine the core skills you have gained in the class to produce a polished report on a question of your choosing, ideally something that you’re passionate about or is relevant to your life or interests.

This final project will incorporate all of the steps involved in the Data Analysis Cycle. You will be in charge of stating interesting questions that don’t duplicate previous projects or workshops, conducting exploratory analyses using skills from the entire semester, building models that you will interpret, then communicating your key findings in a polished, professional narrative.

Remember that the data cycle is an iterative, not linear, process. You may find that your original question must be revised to match the available data or that your data will need restructuring to match the question effectively. Further, your first prediction/explanation models might not be your final models. You’ll revise as you go, and nothing in your plan is set in stone.

Finally, the final project will be an opportunity to practice your communication with non-technical audiences, as your conclusions should be stated in plain language that could be understood by the general public. You can refer to the full prompt for the Final Project for more.

6.2 Choosing Your Data and Questions

The first thing you will need is the dataset you will work with and the research questions you will ask about it.

Your project must analyze a real-world data set, using the course concepts and materials. You should choose a topic and dataset that interests you, and come up with a main question and hypothesis that is answerable. The goal of the project includes demonstrating the breadth of techniques we learned this semester: from data wrangling, visualization, summary stats, hypothesis testing, regression, modeling, and interpretation.

6.2.1 Selecting Data

You can choose whatever dataset you like, but I recommend that you choose one of the three datasets that you found at the beginning of the semester, since you’ve been thinking about those for the last several weeks. If you need help finding datasets, refer back to the prompt for the Documentation Assignment.

Consider that your dataset will need to have certain features in order to conduct certain analyses. For example, if you want to run a regression model, you will need one or more numeric variables. If you want to run a hypothesis test to compare means, you will need a mixture of numeric and categorical variables. Most of our analysis methods work best when you have a fair bit of data, so you probably don’t want a dataset that’s too small. You should choose a dataset while keeping in mind the techniques and methods that we’re learning in this class: this will make it easier for you to choose models and approaches later on.

The way to know you’re choosing the right dataset is to have good research questions that will guide you:

6.2.2 Asking Questions

We can ask questions that are closed (yes or no answer; are the two things different or not?) or open (require more thought and explanation; how much does something change? How is it related to something else?). Likewise, our data analyses can serve one or more purposes as we move through the data analysis cycle: descriptive (describes different measures of the data), exploratory (looking for patterns or unknown relationships in the data); inferential (using a sample to tell us something about a larger population); predictive (use relationships in the current/past data to predict the future); or causal (what happens to one variable when one or more other variables change?).

You will want several research questions (a minimum of 3) that help you to address different parts of your analysis, as well as a rationale for how these questions relate to each other.

Mostly, choose something that you are curious about and will allow you to demonstrate the listed concepts for the project. Have some fun with it!

6.3 Secondary Research

To help you craft your research questions, you’ll find two secondary sources that are relevant to your data or analysis. These could be existing studies on similar data, articles about particular methods, or summaries of data analysis in your chosen domain. You should include a formal citation (MLA is fine) for each source, briefly describe each one in a sentence or two, and then explain how they relate to your research project.

You can use the many resources in the Data Science LibGuide to help you find sources.

6.4 Ethical Considerations

Consider some of the ethical challenges that your data presents, and write a paragraph discussing these. Remember: all data projects involve careful consideration of ethics. Address the ethical issues in your project in terms of the ADSA’s four lenses. You’ll be able to include an updated version of this paragraph in your Final Project.

6.5 Planning Your Analysis

You’ll conclude this assignment with a paragraph discussing how you plan to go about answering your research questions. Among other things, you might discuss:

  • what data wrangling steps you’ll need to take to prepare your data
  • what visualizations you’ll create and why
  • what tests and models you might use and how they will help to answer your questions

Your project will naturally change and grow as you work on it: you aren’t stuck with the exact plan you lay out here. But you should show that you’ve thought carefully about what you want to do next. Once you and I have discussed your Project Plan, you’ll be able to get started on the wrangling and exploratory steps!

6.6 Requirements

A PDF report uploaded to Sakai that includes the following:

  • your name
  • dataset you will use (and hyperlink to cite source)
  • a list of the research questions you will investigate: you should have at least 3 (see Section 6.2.2)
  • one paragraph (5-7 sentences) explaining your rationale for the project: why is this dataset and question interesting? How do your research questions relate to one another?
  • formal citations for 2 secondary sources, with brief descriptions of each (see Section 6.3)
  • one paragraph explaining the ethical considerations for your project (see Section 6.4)
  • one paragraph laying out your analysis plan (see Section 6.5)