Final Project Proposal

Published

April 3, 2025

Complete by: Thursday 3 Apr. at class time

Description

The purpose of the final project is to combine the core skills you have gained in the class to produce a polished report on a question of your choosing, ideally something that you’re passionate about or is relevant to your life or interests (some suggestions are offered later on).

This final project will incorporate all of the steps involved in the Data Analysis Cycle. You will be in charge of stating an interesting question that doesn’t duplicate previous projects or workshops, conducting exploratory analyses using skills from the entire semester, building a model that you will interpret, then communicating your key findings in a polished, professional narrative.

Remember that the data cycle is an iterative, not linear, process. You may find that your original question must be revised to match the available data or that your data will need restructuring to match the question effectively. Further, your first prediction/explanation model might not be your final model. Consider revising the model to solidify your ability to interpret a clear finding.

Finally, the final project will be an opportunity to practice your communication with non-technical collaborators or audience, as your conclusions should be stated in plain language that could be understood by the general public.


Choosing Your Data and Question(s)

The question is open for you to determine, and so is the dataset. You can choose to do a project exploring any field that interests you, including (but not limited to):

  • Anthropology
  • Biology
  • Economics/Business
  • History
  • Literature
  • Music
  • Physics
  • Political Science
  • Psychology
  • Philosophy
  • Sociology
  • Sports
  • Or anything else that you’re excited about!

n.b. You may choose the same dataset you wrote about in your Documentation assignment, but keep in mind many of the tips for choosing data listed below.

Question-asking

We can ask questions that are closed (yes or no answer; are the two things different or not?) or open (require more thought and explanation; how much does something change? How is it related to something else?). Likewise, our data analyses can serve one or more purposes as we move through the data analysis cycle: descriptive (describes different measures of the data), exploratory (looking for patterns or unknown relationships in the data); inferential (using a sample to tell us something about a larger population); predictive (use relationships in the current/past data to predict the future); or causal (what happens to one variable when one or more other variables change?).

Mostly, choose something that you are curious about and will allow you to demonstrate the listed concepts for the project. Have some fun with it!

Where can I find datasets?

The Finding Data page of the CIS LibGuide has an extensive list of data resources, including general sites as well as repositories broken down by topic.

How do I choose data and organize my project?

Your project must analyze a real-world data set, using the course concepts and materials. You should choose a topic and dataset that interests you, and come up with a main question and hypothesis that is answerable. The goal of the project includes demonstrating the breadth of techniques we learned this semester: from data wrangling, visualization, summary stats, hypothesis testing, regression, modeling, and interpretation. Readability is a major component of reports, so use headings, tables, figure captions, etc. to make it easy to follow.

You’ll receive a full prompt for the final project very soon. You’ll be asked to create a report that includes data wrangling, visualization, exploratory analysis, and statistical modeling techniques. The variables to include in your model are open for you to determine, as is the interpretation of the model. You should choose a dataset while keeping in mind the techniques and methods that we learned in this class. It’s a good idea to look for relatively tidy datasets that have the kinds of variables that you know how to analyze. Choosing datasets in this way will also make it easier for you to choose models and approaches later on.


Progress checkpoints

Consistent work and progress on your final projects, once the topic is chosen, will ensure a strong end to the semester. You will need to work on the project outside of class time to complete it to a high standard. Ask questions of your professor early and often!

You will receive feedback from your instructor at the project proposal and progress report stages, and these assignments should also help you focus your thoughts as your explore the dataset. The final week of class, each team will share their progress in a presentation, which you can use as a final way to fine-tune your method, message, and presentation, prior to completing the final report.

These checkpoints are graded and are part of your grade for the final project.

Checkpoint 1: Project Proposal

1 page document stating your name, dataset (with url), and main questions, uploaded to Sakai.

This 1-page proposal document will contain:

  • your name
  • dataset you will use (and hyperlink to cite source)
  • a list of the big research question(s) you will investigate: these should be specific. I recommend 2-4 questions that might guide you in different parts of the report.
  • 5-7 sentences explaining your rationale, why is this dataset and question interesting? How will the data dictate your process for problem solving? What possible approaches might you take to answer your research questions?

n.b. Because we need to stay on track as the end of the semester approaches, you cannot receive an extension on any part of final project.