r studio question

Learning Goal: I’m working on a databases question and need the explanation and answer to help me learn.

At the end of this lab you will upload your RStudio script for points.

For CIS 389 download this dataset:
Download iris.csv

first, we will need data to perform the algorithm on. We will take the classic iris dataset. UCI Machine Learning Repository: Iris Data Set
Links to an external site.

The example is wrong, as the book taught us the right way. Use

For PC

iris <- read.csv(“C:/Users/Student/Desktop/iris.csv”)

For MAC

iris <- read.csv(“/Users/UserName/Downloads/iris.csv”)

First read in the data and then display first 6 elements of the iris data frame. We will take petallength and petalwidth as variables for k-means clustering. Why? Because after much exploration it has been found that these two variables have significant differences among species.

This will load the proper package to make sure the rest of the lab will work correctly

library(cluster)

We will create a new data frame corresponding to these two variables.

  1. kmeans_variables = data.frame(iris$petallength, iris$petalwidth)

Convert vector of characters to factors to avoid invalid color name error

2. pClass <- as.factor(iris$class)

Let’s display the data.

3. plot(kmeans_variables,col=pClass)

Applying K-means

4. KMC = kmeans(kmeans_variables, centers = 3, iter.max=50, nstart=20)

centers

number of clusters, k In our case we take it as 3 as number of different species are 3.

iter.max

maximum number of iterations to be performed.

nstart

R will try 20 different random starting assignments and then select the one with the lowest within cluster variation.

Output –

Kmeans clustering with 3 clusters of sizes 49, 48, 52

Cluster means:

iris.petallength iris.petalwidth

1 1.465306 0.244898

2 5.5958332.037500

3 4.269231 1.342308

Clustering vector:

[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

[38] 1 1 1 1 1 1 1 1 1 1 1 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3

[75] 3 3 2 3 3 3 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 3 2 2 2 2 2

[112] 2 2 2 2 2 2 2 3 2 2 2 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 3 2 2 2 2 2 2 2 2 2 2

[149] 2

Within cluster sum of squares by cluster:

[1] 2.032245 16.291667 13.057692

(between_SS / total_SS = 94.2 %)

Available components:

[1] “cluster” “centers” “totss” “withinss” “tot.withinss”

[6] “betweenss” “size” “iter” “ifault”

KMC$cluster will give the details of which data point is assigned which cluster.

KMC$centers will give the details regarding cluster centroids of each cluster

> KMC$centers

iris.petallength iris.petalwidth

11.465306 0.244898

25.595833 2.037500

34.269231 1.342308

KMC$size will give number of data points inside each cluster

  1. > KMC$size
  2. [1] 49 48 52

For other parameters look here – K-Means Clustering
Links to an external site.
. But for beginner I think this much is sufficient.

Let’s see which species got which cluster

  1. > table(KMC$cluster, iris$class)
  2. Irissetosa Irisversicolor Irisvirginica
  3. 1490 0
  4. 2 0246
  5. 3 0 48 4

We can see setosa got cluster 1, versicolor got 3 and virginica got 2.

Plotting k-means

  1. clusplot(iris, KMC$cluster, color=TRUE, shade=TRUE, lines=0)

What you are actually watching in the clusplot() is the plot of your observations in the principal plane. What this function is doing is calculating the principal component score for each of your observations, plotting those scores and coloring by cluster.

Principal component analysis (PCA) is a dimension reduction technique; it “summarizes” the information of all variables into a couple of “new” variables called components.

For simple plot use

  1. plot(kmeans_variables,col=KMC$cluster)

Sources –

NOTE:To install the cluster package in RStudio: install.packages(“cluster”)

K Means Clustering in R
Links to an external site.

Using the stats package in R for kmeans clustering
Links to an external site.

How to produce a pretty plot of the results of k-means cluster analysis?

Do you need help with this paper? 🏆 - Let us help you write it!

Why Choose Our Essay Writing Service?

  • ✅ Original writing: Our expert writers will write each paper from scratch, ensuring complete originality, zero plagiarism and AI free content.
  • ✅ Expert Writers: Our seasoned professionals are ready to deliver top-quality papers tailored to your needs.
  • ✅ Guaranteed Good Grades: Impress your professors with outstanding work.
  • ✅ Fast Turnaround: Need it urgently? We've got you covered!
  • ✅ 100% Confidentiality: Customer privacy is our number one priority. Your identity is anonymous to our writers.
🎓 Why wait? Let us help you succeed! Our Writers are waiting..

Get started

Starts at $9 /page

How our paper writing service works

It's very simple!

  • Fill out the order form

    Complete the order form by providing as much information as possible, and then click the submit button.

  • Choose writer

    Select your preferred writer for the project, or let us assign the best writer for you.

  • Add funds

    Allocate funds to your wallet. You can release these funds to the writer incrementally, after each section is completed and meets your expected quality.

  • Ready

    Download the finished work. Review the paper and request free edits if needed. Optionally, rate the writer and leave a review.