Exploratory Data Analysis
Martindale High School has recently renovated and reopened its South Wing. Claudia is a new student assigned a locker in this new wing. When she excitedly opens her “new” locker, she discovers that the renovation crew seems to have neglected the actual insides of the lockers—fresh coat of paint in the outside; same old rusty, dented interior.
Even the previous occupant’s end-of-year pile of trash is left intact. She begins to clean out the musty mess inside. “Nasty! I think this used to be an apple.” Not wanting to touch it, she coaxes it into a wastebasket with a ruler. Underneath, she finds a folded piece of paper. “What’s this?”
Carefully unfolding it, she finds a note left by the mysterious previous locker tenant:
You could formulate a hypothesis based on your intuition, such as “Of course, it’s Justin Jones in homeroom—or—Justin Bieber,” and call it a day. But wait a minute! Given the text of the note, it’s obvious we have a public figure on our hands—someone who’s been in the news. This means we have data—and lots of it. Now we can test our hypothesis. We can search for Justin in Google and see what comes up.
Wow! We get lots of hits for Justin Bieber:
Of course, we can’t prove that the note is about him, but we can be fairly confident that it is, right?
Maybe we can add keywords such as “album” or “hate” or we can restrict our query to search news only. It seems he is the most likely “Justin” with an “album” in the news…
What do you think? Is it the Biebs, or somebody else? Conduct some exploratory data analysis and test your hypothesis:
Assignment: Investigate the Mysterious Justin
In your groups, investigate the meaning of this note using Google Trends and Google Correlate to conduct big data analysis. Google Trends is an online tool that allows users to analyze search-term frequencies across dates and geographic locations. For this assignment, you must submit a document that contains:
- A hypothesis about the probable identities of “Justin.”
- A rationale that utilizes data analyses from Google Trends and Google Correlate to support your work.
To best support your work, you should take notes on the features you use and results you find. Screenshots of key findings may be useful.
Using Google Trends and Google Correlate
You should get to know Google Trends and Google Correlate. Google provides a tutorial on Google Correlate, complete with screenshots and examples. Feel free to refer to it as necessary, or to ask your peers or teachers questions.
Google Trends is rather intuitive and easy to use. It can be very interesting, especially if you learn to set all the parameters, which are highlighted in the image below:
Explore the tools and use as much of the functionality as you can. You will be able to tell how search terms change over time, what terms are used together, from where people search, and even what news events correlate with trends over time.
Be Prepared to Discuss
- What did you do first? Next? Last?
- What would you like to be able to do?
- What was the most effective tool in Google Trends? Why?
- Was Google Correlate effective or efficient? Why or why not?
- How important was time/place/other key terms as search queries?
- Who is Justin? Justify your answer.
- Will we ever know conclusively? Why or why not?