WEEK 4 DQ
TURNING DATA INTO INFORMATION
Having framed the problem/opportunity, formulated a testable hypothesis and gathered and organized key data, you are ready to continue your analysis by developing a data story that can be shared with others. To get started, download and review the “Types of Data Analysis” guide from our Week 4 Weekly Materials.
Apply one or more of the following analytical tools to your dataset:
Correlation
Regression
Grouping and Visualization
Variance
Standard Deviation
Explain whether your analysis of the data confirmed or refuted your testable hypothesis.
****************************************************************************************************************************************
JWI 599: Business Analyticsand CapstoneTypes of Data Analysis
© Strayer University. All Rights Reserved. This document contains Strayer University confidential and proprietary information and may not be copied, further distributed, or otherwise disclosed, in
whole or in part, without the expressed written permission of Strayer University. This document is subject to change based on the needs of the class.
JWMI 599 Types of Data Analysis (1188) Page 1 of 4
Reference Sheet: Types of Data Analysis
Data analysts use many different types of analysis when looking for patterns, correlations, and causations in data sets. Each type of analysis
serves a different purpose; therefore, its important to select the most useful option(s), depending upon: your organization; the issue, problem or
opportunity; and the particular data sets that you have collected. This reference sheet is intended as a resource to support you in deciding which
type(s) of data analysis are most useful to apply to your work as you prepare for your Capstone project.
Type of Analysis Definition Primary purpose Recommended use
Grouping and Visualizing
Definition:
Grouping quantitative data and
metrics into a limited set of clearly
defined variables defines and matches
that type of graphic illustrations and
visualization medium that effectively
communicates the analytical story to
the targeted stakeholders.
Purpose:
The purpose of grouping values in a
selected data set is to create
categories for analyses based on the
defined analytical problem or
opportunity.
Recommended use:
Unique data visualizations are a more
user-friendly way of communicating
quantitative data and metrics to
stakeholders.
How To Steps:
1. Group the raw data into categories
2. Identify and define 2 or 3 variables you want to measure
3: Create a visual illustration to show your selected categories (e.g., bar chart, histogram, line graph, or pie chart)
Cluster Analysis
Definition:
Cluster analysis is that process of
grouping a set of data in such a way
that the data is each cluster or group
are more similar to each other than
the data in other clusters or groups.
Purpose:
Cluster analysis is a simple
exploratory statistical procedure that
sorts different homogeneous groups
of data into a smaller or more
meaningful data set for analyses.
Recommended use:
Cluster analysis is used to identify
groups within a database that are not
previously known.
How To Steps:
1. Identify and select a database
2. Identify the number of clusters in advance
3. Select a cluster analysis methodology to group the each observation in the selected database
(K-Means Cluster Analysis, Hierarchical Cluster Analysis, and the Two-Step Cluster Analysis)JWI 599: Business Analytics and Capstone
Types of Data Analysis
© Strayer University. All Rights Reserved. This document contains Strayer University confidential and proprietary information and may not be copied, further distributed, or otherwise disclosed, in
whole or in part, without the expressed written permission of Strayer University. This document is subject to change based on the needs of the class.
JWMI 599 Types of Data Analysis (1188) Page 2 of 4
Chi-Square
Definition:
A chi-square is a statistical test for
independence that determines
whether there is a significant
association between the two
variables.
Purpose:
The purpose of a chi-square or
goodness of fit test is to determine if
there is any difference between the
observed value and the expected
value.
Recommended use:
A chi-square statistical test is applied
when you have two categorical
variables from a single population.
How To Steps:
The formula for the chi-square statistic is:
1. C are the number of the degrees of freedom
2. 0 represents the observed value
3. E represents the expected value
4. DF degrees of freedom
5. N number of observations
NOTE: Degrees of freedom represent how many dependent variables or values involved in an analytical calculation
have the freedom to vary (DF = N-1).
Measurements of Central
Tendency
Definition:
A simple mathematical technique
used to identify the location of the
center of a quantitative distribution.
Purpose:
MEAN: Mathematical average in a
distribution.
MEDIAN: Mathematical mid-point in a
distribution.
MODE: Most frequent value in a
distribution.
Recommended use:
For NOMINAL data use the MODE.
For ORDINAL data use the MEDIAN.
For INTERVAL/RATIO (not skewed)
data use the MEAN.
For INTERVAL/RATIO (skewed) data
use the MEDIAN.
How To Steps:
1. Arrange your data set from smallest to largest values
2. Determine which measure of central tendency to use in the analysis
3. Calculate the selected measure of central tendency (mean, median or mode)
Ranges
Definition:
The range is a descriptive statistic that
measures is the difference between
the lowest and highest values in a
data set.
Purpose:
The purpose of the range is to shows
how well the measure of central
tendency represents the values in a
selected data set.
Recommended use:
Ranges are used to how spread out
the values are in a selected database.JWI 599: Business Analytics and Capstone
Types of Data Analysis
© Strayer University. All Rights Reserved. This document contains Strayer University confidential and proprietary information and may not be copied, further distributed, or otherwise disclosed, in
whole or in part, without the expressed written permission of Strayer University. This document is subject to change based on the needs of the class.
JWMI 599 Types of Data Analysis (1188) Page 3 of 4
How To Steps:
1. List the elements of the data set
2. Identify the highest and lowest numbers in the dataset
3. Subtract the smallest number in the data set from the largest number in the data set
4. Label the range
Variance
Definition:
Statistical variance measures how the
data distributes from the mean or the
expected value.
Purpose:
The variance is used to measure
probability distributions. For example,
the variance can help determine the
risk an investor might take on when
purchasing a specific security in the
market.
Recommended use:
Unlike the range that only looks at the
extremes, the variance looks at all the
data points or observations and than
determines their distribution.
How To Steps:
1. Select a data set and calculate the MEAN
2. For each number or observation in the data set, subtract the MEAN and square the results (squared differences)
3. Calculate the average of each squared differences
Standard Deviation
Definition:
A standard deviation (SD) is a
statistical measure that is used to
quantify the amount of variation or
dispersion of a set of data values.
Purpose:
A standard deviation assesses how
far the values are spread above or
below the mean of a selected
population or sample data set.
Recommended use:
A high standard deviation shows that
the data is widely spread (less
reliable) and a low standard deviation
shows that the data are clustered
closely around the mean (more
reliable).
How To Steps:
1. Select a sample data set.
2. Calculate the mean of the sample.
3. Subtract mean value from each data value.
4. Square each result.
5. Find the sum of the squared values.
6. Divide by n-1, where n is the number of data points.JWI 599: Business Analytics and Capstone
Types of Data Analysis
© Strayer University. All Rights Reserved. This document contains Strayer University confidential and proprietary information and may not be copied, further distributed, or otherwise disclosed, in
whole or in part, without the expressed written permission of Strayer University. This document is subject to change based on the needs of the class.
JWMI 599 Types of Data Analysis (1188) Page 4 of 4
Confidence Intervals
Definition:
A statistical method that estimates the
probability that a population
measurement is similar to a sample
value.
Purpose:
Confidence intervals are easy ways to
understand the amount of uncertainty
in a sample estimate of a population.
Recommended use:
Confidence intervals are used to draw
inferences on population values from
one or more samples.
How To Steps:
To calculate the confidence interval from a sample mean, choose either a 95% or greater confidence level which
represents the amount of uncertainty in the sampling method, meaning that each time the same sampling method is
used, the true population value would represent 95% or greater of all samples as well. That also means that 10% or less
of the sample would not contain the true population value.
Correlation
Definition:
A statistical measure that indicates
either a positive or negative
relationship between two or more
variables.
Purpose:
The purpose of correlation in analytics
is to determine which variables are
connected.
Recommended use: