subject

You do not need to import any libraries or modules about K-means clustering because you will implement it from scratch. The template of the code is provided, and you just need to write your code at specified locations with "your code is here". Download the dataset ‘k_means_clustering_data. csv’ and save it into your working directory where we can find your source code about this homework. The dataset has two columns (‘x’ and ‘y’) and 42 records. They are 42 points in a 2D plane. Your goal is to group them into K clusters using K-means clustering algorithm. The basic step of k-means clustering is simple. Initially, we determine number of cluster K and select K centroid or center of these clusters from the dataset randomly. Then the K-means algorithm will iterate at the following steps until convergence. a. Update each centroid coordinate based on the data points in the cluster b. Measure the distance of each point in the dataset to the K centroids c. Group the point based on minimum distance this is the code provided please fill it in

def k_means_clustering(data, centroids, k):
centroid_current = centroids
centroid_last = pd. DataFrame()
clusters = pd. DataFrame()
data = pd. read_csv('k_means_clustering_data. csv')
data = [(float(x),float(y)) for x, y in data[['x','y']].values]
# iterate until convergence
while not centroid_current. equals(centroid_last):

cluster_count = 0 #it counts the number of clusters. Cluster IDs start from 0.
# calculate the distance of each point to the K centroids
for idx, position in centroid_current. iterrows():
# your code is here. Save the Euclidean distances into 'clusters'

# your code ends
cluster_count += 1

# update cluster, assign the points to clusters
clusterIDs = []
for row_idx in range(len(clusters)):
# your code is here. Check the distances at every row in 'clusters'. Save the assigned cluster IDs to points. The IDs start from 0

# your code ends
# assign points to clusters. The information is saved in the list and assigned to the dataset.
data['Cluster'] = clusterIDs

# store previous cluster
centroid_last = centroid_current

# Update the centroid of each cluster. All information are in 'data'. You have to calculate the new centroids based on the points in the same cluster.
# The centroid is the center of a list of points. For example, (x1, y1), (x2, y2), ..., (xn, yn). The centroid is (x, y), where x = the mean of (x1, x2, ..., xn) and y = the mean of (y1, y2, ..., yn).
centroids =[]
points= [] # save k lists of points in the list. The points in the same list are in the same cluster.
# your code is here. The K centroids will be saved in 'centroids', e. g. [[1, 2], [3, 4], [5, 6]]

# your code ends
centroid_current = pd. DataFrame(data=centroids, columns = ['x', 'y'])

print("No updates on clusters: ", centroid_current. equals(centroid_last))

print("Convergence! Final centroids:", centroid_current)
# plotting
print('Plotting...')
colors= ['b', 'g', 'r', 'c', 'm', 'y', 'k']

# scatter plot all points. All points are colored circles
for i in range(k):
p = np. array(points[i])
x, y = p[:,0], p[:, 1]

plt. scatter(x, y, color = colors[i])
plt. scatter(centroid_current['x'], centroid_current['y'], marker='^', color = colors[i])

# scatter plot all centroids. All points are colored triangles
for j in range(k):
plt. scatter(centroid_current. iloc[j][0], centroid_current. iloc[j][1], marker='^', color= colors[j])

plt. show()

And this is he data provided

x y
1 0
1 1
1 2
2 0
2 1
2 2
2 7
2 9
3 0
3 2
3 4
3 6
3 8
4 4
4 7
4 9
5 5
5 6
5 7
5 8
5 9
5 10
6 2
6 3
6 8
7 0
7 1
7 2
7 4
7 7
7 9
7 10
8 0
8 1
8 2
8 3
8 8
9 0
9 2
9 3
4 2
5 3

ansver
Answers: 1

Another question on Computers and Technology

question
Computers and Technology, 21.06.2019 22:00
What do the principles of notice, choice, onward transfer, and access closely apply to? a. privacyb. identificationc. retentiond. classification
Answers: 1
question
Computers and Technology, 23.06.2019 12:30
How is the brightness of oled of the diaplay is controled
Answers: 1
question
Computers and Technology, 23.06.2019 13:30
Anetwork security application that prevents access between a private and trusted network and other untrusted networks
Answers: 1
question
Computers and Technology, 23.06.2019 13:30
Me ! evelyn is a manager in a retail unit. she wants to prepare a report on the projected profit for the next year. which function can she use? a. pmt b. round c. division d. what-if analysis
Answers: 2
You know the right answer?
You do not need to import any libraries or modules about K-means clustering because you will impleme...
Questions
question
Chemistry, 22.06.2019 13:00
question
English, 22.06.2019 13:00
Questions on the website: 13722361