In this assignment you will use Predictive Analytic techniques to analyze four business cases.
The data for this assignment can be found in Canvas in Excel or CSV format. The intention is that you will
upload the data to Tableau and complete your analyses there; however, you may also use other tools
(including Excel).
Please prepare a PowerPoint presentation which addresses your findings regarding the questions and
issues presented for each of the cases. A PowerPoint template is available on Canvas. The format of your
presentations should be appropriate for a business presentation on the issues raised in the cases. Your
work should include accurate analyses, appropriate visualizations that support your analysis and
succinct verbal explanations of your findings. An appropriate response would generally be 3 – 5 pages in
length for each case.
You will submit your assignment by uploading your PowerPoint document to Canvas.
You have a data set of Diamond prices for over 50,000 diamond sales which have been divided
into a Training Set and a Validation set. The data contains the following characteristics of a
diamond:
(1) Carat
(2) Cut
(3) Color
(4) Clarity
(5) Depth
(6) Table
(7) X
(8) Y
(9) Z
Carat, Cut, Color, Clarity are the traditional “4 C’s” of diamond ratings. X, Y and Z are the
physical measurements, and Depth and Table are further measurements of the shape of the
diamond as shown here:
Create a Model using the Training Set to predict a Diamond’s Price as a function of the variables
shown above.
Investigate how successful the model is in predicting prices by demonstrating the MAPE of the
Validation set. Discuss which sorts of diamonds are not being valued properly and suggest ways
to handle that in the model.
Use the model to predict prices of five new diamonds that you have available to sale.
Prepare a brief presentation discussing your model and its accuracy and showing your
prediction for the five new diamonds.
DATA: Diamonds.xlsx
Your task is to forecast future sales for a nationwide department store chain. You have Sales
Data by department for the period January 2019 through August 2021 and you would like to
forecast Sales through August 2023 by day.
Look at Daily sales patterns of the Department store and comment on any trends and
seasonality you observe. Next create TWO separate forecasts of Sales:
(1) A forecast using Triple-exponential smoothing (which is produced by the “Forecast”
feature in the Analytics Tab in Tableau). Set the choices to those that you feel produces
the best forecast
(2) A forecast using Gaussian Process (using the MODEL_QUANTILE function). Capture both
Day-of-week and seasonal fluctuations.
Show a visual representation of each forecast, and comment on which forecast is superior
and the accuracy and/or shortfalls of your chosen forecast.
DATA: Department Store.xlsx
You oversee promotions for a website of an online book seller. You have created a campaign for
members of your loyalty program around the theme of the “Great American Read” (a list of 100
books that American’s ranked as their favorite novels in a 2018 PBS series). When members
purchase a book on the Great American Read list you offer them a discount on another book
from that list. You have data on the members’ previous buying history and data on their
purchases during your Great American Read promotion.
1) Perform a clustering analysis which would predict which Genre a customer will purchase
(for their first book).
2) Perform another clustering analysis to predict whether the customer buys a second
book
3) Choose your favorite book from the list and demonstrate which book you should be
shown to you as a second book choice.
4) Explain your findings in a brief presentation
DATA: Great American Read.xlsx
You are responsible for pricing car insurance policies. You will model the likelihood of a vehicle
incurring damage during the coverage period based on an EXPONENTIAL distribution, and you
will model the amount of damage (when it occurs) as a LAPLACE Distribution.
Your policies have a deductible and a coverage limit. The insurance company will pay the
customer for any damage above the deductible but not more than the coverage limit.
You will conduct a Monte Carlo analysis where you simulate whether damage occurs and the
amount of damage and then calculate the amount paid to the Customer by the insurance
company. You will price the insurance policy at the average amount paid to the customer in
your simulations, grossed up by your required profit margin.
The model must include the following parameters:
• The mean likelihood of damage, ranging from 10% – 20% in steps of 1%
• The mean damage amount (when damage occurs), ranging from $5,000 –
$10,000 in steps of $1,000
• The standard deviation of the damage amount (when damage occurs) ranging
from $500 – $5,000 in steps of $500
• The policy deductible, ranging from $0 – $3,000 in steps of $500
• The policy coverage limit, ranging from $5,000 – $30,000 in steps of $5,000
• The policy profit margin, ranging from 10% – 30% in steps of 5%
Demonstrate the price of an insurance policy that your model would recommend with the
following assumptions:
• Mean likelihood of damage: 15%
• Mean damage amount (when damage occurs): $5,000
• Standard deviation of the damage amount (when damage occurs): $2,500
• Deductible: $1,000
• Coverage limit: $10,000
• Profit margin: 20%
DATA: random.xlsx