1) Come up with question(s) that can be answered by analyzing data.
(You might want to start a Google Slides presentation and save the question(s) on the first slide.)
2) Find some data on the web that might answer your questions.
Example data sources:
- https://github.com/fivethirtyeight/data
- https://www.kaggle.com/
- https://ourworldindata.org/
- datasetsearch.research.google.com
You should look for data sources that already have the data in a CSV file or JSON file or nicely formatted web table if possible. Sometimes you can find that by adding “dataset” or “csv” to your search terms.
Sometimes you can copy and paste a data table into Excel and save it as a CSV, or simply do the analysis in Excel. Use the “Paste Special – Unicode” option (works in some cases, not in others).
You need enough data to do interesting analysis and draw interesting conclusions. If you have a top 10 list and your analysis is identifying the biggest numbers in the list, you need better data.
3) Load this data into Excel or Python. You might need help here:
a) Some examples of how to load file data
b) Example for loading numerical data at bottom of this page
4) Create one or more data visualizations (charts / graphs) to make the data easier to understand.
Some resources here
5) Answer your question as best as you can from the data you found.
6) What new questions came out of your analysis?
7) Create some slides that includes your overall findings. Include the questions, the data visualizations, the answers, and further questions.
Content to include in your slides:
- Question(s) you are investigating
- Where you found data
- Data visualization(s)
- Your findings from the data
- Further questions that would be interesting to investigate
Proficiency Scale – Organize, analyze, and communicate data using computational tools. (as of Fall 2023)
Level 1
- Ask questions that can be answered using a dataset.
- Analyze data to answer the questions and communicate findings.
Level 2
- Generate questions, related to a topic of interest, that can be answered with data.
- Find data that can be analyzed to address the questions.
- Use tools such as Python or Excel to analyze data.
- Create data visualizations to help analyze and communicate the data.
- Draw the obvious conclusions from the data.
Level 3
- Ask interesting questions that have the potential to yield answers that may be nuanced, unexpected, complicated, or otherwise interesting.
- Create a data visualization that makes the data easier to understand and helps to communicate the findings from the data analysis.
- Demonstrate proficiency in analyzing data; this could include ability to pre-process data to make it useable in Excel or Python, sorting and rearranging data, removing bad data, and picking out useful data from a larger dataset.
- Draw logical conclusions from the data, beyond the most obvious conclusions.
- Identify further questions that could be studied related to the chosen topic.
Level 4
- Ask questions that are relevant, thoughtful and which can result in relevant, thoughtful, complex, and otherwise interesting answers.
- Demonstrate advanced proficiency in analyzing the data; for example: discuss nuanced and complicated data in thoughtful and insightful ways; analyze a large amount of data; analyze the data using clever Python algorithms or use of Excel tools; deal with data from multiple files; obtain data from an online API; combine data from multiple sources
- Create polished data visualizations that are readily understandable.
- Produce insightful results to the initial questions that would have been difficult to obtain without the use of computational tools.
Proficiency Scale – Communicate Descriptions of Computational Processes (as of Fall 2023)
Level 1
Communicate a question, some aspects of a dataset, and a conclusions to an audience, possibly with certain omissions or inaccuracies.
Level 2
Communicate questions about data, the data itself, and conclusions drawn from the data to an audience.
Level 3
Clearly and accurately communicate questions about data, the data itself, and conclusions drawn from the data in order to help an audience easily understand what the data means.
Level 4
Clearly and accurately communicate questions about data, the data itself, and conclusions drawn from the data in a concise, cogent, and interesting way in order to help an audience easily understand what the data means.