Statistical Analysis

Survey Says!

Not all statistics or data analyses are meaningful. Just because somebody punctuates a fact with a statistic, does not make it meaningful (or correct). For instance, watch the video to the right that demonstrates a little statistical absurdity from the movie Anchorman.

https://www.youtube.com/embed/pjvQFtlNQ-M

Descriptive, Predictive, and Prescriptive Analytics

There are three uses of statistical analysis that are commonly used by scientists, mathematicians, politicians, and other professionals across the globe:

  • Descriptive analytics provide information about collected data via statistics that you are probably familiar with—mean, median, mode, range, etc. They tend to “describe” circumstances, but don’t offer conjectures about unknowns.
    • e.g., How many (in terms of percentage) computer science graduates are paid salaries of $100,000 or more within five years of graduating?
  • Predictive analytics may provide information about future (or merely unobserved or unknown) events based on previously collected and analyzed data.
    • e.g., How likely is it that I will be able to find a high-paying job if I choose to major in computer science vs. biology?
  • Prescriptive analytics may provide information to maximize the chances of a future event occurring, based on comparing the predictive analyses of multiple options.
    • e.g., Which major should I choose in order to maximize my chances of making the highest starting salary after graduation?

These three types of analytics each have advantages and disadvantages that experts must evaluate to utilize them properly. These types of analysis each serve different purposes, but they all allow for powerful inferences to be drawn from data.

Utility and Confidence of Drawing Inferences

The following report card grades each type of analysis on its utility (how useful is it?) and confidence (how likely is it to be true and/or valid?) in the context of decision making. Notice that there is no perfect analysis.

Analysis Utility level Confidence level
Descriptive C A+
Predictive B+ B-
Prescriptive A C

Let’s take a look at each type of analysis, its purpose, and how it is applied to common computing tasks (e.g., a Google search).

Descriptive Analytics

Function Application
easiest to derive “hard facts” from The spidering, caching, and indexing of the web is all centered on descriptive analytics
Example: the percentage of graduates employed within six months of graduating In essence, this creates an easily accessible description of the web

Predictive Analytics

Function Application
uses descriptive analysis’ “hard facts” to extrapolate (make inferences) about where unknown data may lie the retrieval and ranking of pages in response to a search query
Example: Given that 90 of the 100 CS graduates were employed within six months in 2011, it is __ % likely that 108 of the 120 CS graduates in 2012 will be employed within six months. creates a representation of the search terms and compares it against the descriptive (indexed) model of the web
predictions are not “hard facts"; they may be wrong! in essence, predicts what pages will be relevant to whom

Prescriptive Analytics

Function Application
compiles predictive hypotheses and recommends a plan of action to maximize the liklihood of something happening autocomplete is a form of prescriptive analysis
Example: which major should I choose to maximize the chance that I’ll have a job within six months of graduation? recommendations for further searches are based on potential next steps for users
confidence is the lowest here, because all of the prediction errors associated with previous analyses are compounded. based on the ranking of previous/potential queries that are similar to the current search query

More Is Better

Generally, more data leads to greater confidence. Each of these are based on building models from data. The models’ fit to the data increases their power (and thus, utility). This is why big data can be so powerful.

Google’s searches are often effective, because their data set is huge. They have a ton of data from which to conduct descriptive, predictive, and prescriptive analyses, and then use those analyses to improve user experiences.

Common misconception: Autocomplete is programmed to give its results.
  • Autocomplete is based on simple statistics of what people are actually searching for. In other words, if how do I tie results in How do I tie a tie, this is because statistically speaking, How do I tie a tie is the most common query that begins with how do I tie.