In this lecture we had a look at the ACLED dataset that has five years data providing information on political and other types of violence in the United States and in India.
We learned about the concepts of clustering, how clusters are identified from geographic plots of a dataset, and what the k-means clustering is.
-
- Latitude/Longitude data for location plotting (geographic plots)
- How do the locations seem to be distributed?
- Clustering (K-means clustering)
- Can you interpret the clusters?
- Geographic distance b/w two points
- Spatial Data
- Is it random?
- Latitude/Longitude data for location plotting (geographic plots)
Understanding K means clustering:
K-means clustering is not only done using geographical plot, but it can also be done anytime you have distance between two points.
-
- Choose K random spots in datasets
- Algorithm assigns each cluster a number and all the nearest dots to clusters are assigned to each cluster.
- This process is repeated for each cluster
- Once clusters are ready, center of math (centroid) is taken for each cluster, and the process of clustering is repeated again
- Re-do the cluster for each one created
- The above process is repeated until the clusters are stablized
- Choose K random spots in datasets