python - sklearn: Measuring similarities between different sets of independent variables -
suppose, have
points1 = np.array([[0,0], [1,1], [0,1],[1,0]]) points2 = np.array([[2,1], [0,0], [1,0],[0,1]]) now ordered construction. mve , in application, elements of each array shuffled. (here can use np.random.shuffle() ).
if @ dataset, can see 3 of elements same points, while 1 of them changes. in other words:
[0,0]inpoints2closest point[0,0]inpoints1, ,[0,0]inpoints1closest point[0,0]inpoints2.[0,1]inpoints2closest point[0,1]inpoints1, ,[0,1]inpoints1closest point[0,1]inpoints2.[1,0]inpoints2closest point[1,0]inpoints1, ,[1,0]inpoints1closest point[1,0]inpoints2.[0,1],[1,0],[2,1]inpoints2closest points[1,1]inpoints1,[1,1]inpoints1closest point[2,1]inpoints2
notice how specified both ways! indeed 1 point in first list closest point in second list, inverse might not true! (i.e. there point in second list closest same point in first list).
also, notice in last bullet point have 3 points closest [1,1]. however, in case multiple points closest given point, want eliminate have found different closest point. instance in last bullet point keep [2,1] in points2 closest point [1,1] , since [0,1] in points2 considers [0,1] in points1 closest , [1,0] in points2 considers [1,0] in points1 closest, "shipped" or "busy".
my problem
now given 2 arrays of same dimensions (as above), i.e. contain same number of points, want able find match (or matches if there isn't unique one) such that:
- each point in
points1matched 1 , 1 point inpoints2. , such each point inpoints2matched 1 , 1 point inpoints1. - these matches given fact closest. imagine had 1 of lists of points , applied random shock each point. want "guess" each point went (and should closest, never mind part).
how can that?
my try
first of tried writing function loop through points in
points1, each of them, compute distance each of points inpoints2. after this, go through each of these distance-lists, order them , find minimum. match each point 1 minimum distance. falls problem if closest be, doesn't imply b closest a, , can end matches many points matched 1 point , on.- i tried
sklearn.metrics.pairwise_distances_argmin_min(y,x)can read here seems doing different want since again doesn't match each point 1 , one. (or better, in output does, think chooses based on point declared closest first)
- i tried
do have suggestions?
basically given 2 lists of points (of same size) want match points distance (any distance metrics fine guess, maybe euclidean , manhattan best). if possible after matching has been found, able output "similarity score". ideas?
calculate of distances between of elements in list1 , list2. order these distances. (actually these objects containing distance , references 2 endpoints) smallest distance , pair endpoints of distance. next smallest distance none of endpoints paired yet. continue until of points paired.
unfortunately algorithm has o(n2) complexity. long have compare list few hundreds (or maybe thousands) of elements work. above extremely slow...
Comments
Post a Comment