Final Written Report
Complete by: Thursday 1 May at 2pm
Please note that I cannot accept any work past this deadline.
A polished Jupyter Notebook HTML file reporting the results of your final data analysis project. Roughly, 5-7 written pages (though this is hard to measure in a Jupyter notebook, so consider it a guideline). Think about this report as a “final takeaway” of all the skills you’ve learned in class over the semester. Below is a rough structure of your final written report. This should be a ready-to-deliver report with clear section headers and interpretations of any statistical or graphical output (like several of our previous projects).
Introduction
- Provide a one-to-three paragraph introduction, professionally written, that gives an overview of the essentials someone needs to know to make sense of the data you show.
- Consider some of the ethical and logistical challenges that your data presents, and discuss this in your introduction. Address the ethical issues in your project in terms of the ADSA’s four lenses.
- You must cite and link to your dataset, and you can use Markdown to create contextual links like so:
[text here](website)
.
Data Explanation and Exploration
- Provide some details describing the data you are working with. What are the observations? The key variables you will be looking at? Are there any particular challenges in the data you will need to work through or be aware of during analysis?
- Provide four polished visuals that describe the data in a way relevant to your question (descriptive, not related to your statistical model specifically–not a regression plot). No more than two of these should be the same visualization type. Write text that describes the data and what the visuals tell you about your data or decisions you will need to make for the analysis.
Statistical Analysis and Interpretation
- Provide at least two distinct statistical models (for example: multivariate regression and hypothesis test; naive bayes classification and Kmeans clustering; logistic regression and random forest classification; etc.) that you interpret correctly and fully in the text. These can be whatever you choose, but you should explain why you chose the model you did, and why they fit the data.
- Provide at least three polished visuals that specifically support and validate the model(s) you have developed (e.g., residual and regression line/scatter, histogram showing normality of data or residuals, confusion matrix, etc.), or help to communicate your main result. Visuals should have captions and be referred to clearly in your text, and they should not all be the same (e.g., not three scatterplots).
- Text should fully explain what you show and your findings, to someone who is unfamiliar with your data, code, and models, in terms of the data and in plain language.
Conclusions
- Provide one or two paragraphs concluding about the data: what does it tell us, what are the limitations to this data/model, and what is one future direction you could envision for future data analysts or data collectors?
- Find at least one secondary reference that is relevant to or supports your insights, and cite it in this section. You may cite a reference by linking directly to it in your markdown
[text here](link here)
, and listing the full citation below the conclusions section. Please ask me if you aren’t sure how to cite references.