python - sklearn: Measuring similarities between different sets of independent variables -


suppose, have

 points1 = np.array([[0,0], [1,1], [0,1],[1,0]])  points2 = np.array([[2,1], [0,0], [1,0],[0,1]]) 

now ordered construction. mve , in application, elements of each array shuffled. (here can use np.random.shuffle() ).

if @ dataset, can see 3 of elements same points, while 1 of them changes. in other words:

  • [0,0] in points2 closest point [0,0] in points1, , [0,0] in points1 closest point [0,0] in points2.
  • [0,1] in points2 closest point [0,1] in points1, , [0,1] in points1 closest point [0,1] in points2.
  • [1,0] in points2 closest point [1,0] in points1, , [1,0] in points1 closest point [1,0] in points2.
  • [0,1] , [1,0], [2,1] in points2 closest points [1,1] in points1 , [1,1] in points1 closest point [2,1] in points2

notice how specified both ways! indeed 1 point in first list closest point in second list, inverse might not true! (i.e. there point in second list closest same point in first list).

also, notice in last bullet point have 3 points closest [1,1]. however, in case multiple points closest given point, want eliminate have found different closest point. instance in last bullet point keep [2,1] in points2 closest point [1,1] , since [0,1] in points2 considers [0,1] in points1 closest , [1,0] in points2 considers [1,0] in points1 closest, "shipped" or "busy".

my problem

now given 2 arrays of same dimensions (as above), i.e. contain same number of points, want able find match (or matches if there isn't unique one) such that:

  • each point in points1 matched 1 , 1 point in points2. , such each point in points2 matched 1 , 1 point in points1.
  • these matches given fact closest. imagine had 1 of lists of points , applied random shock each point. want "guess" each point went (and should closest, never mind part).

how can that?

my try

  1. first of tried writing function loop through points in points1 , each of them, compute distance each of points in points2. after this, go through each of these distance-lists, order them , find minimum. match each point 1 minimum distance. falls problem if closest be, doesn't imply b closest a, , can end matches many points matched 1 point , on.

    1. i tried sklearn.metrics.pairwise_distances_argmin_min(y,x) can read here seems doing different want since again doesn't match each point 1 , one. (or better, in output does, think chooses based on point declared closest first)

do have suggestions?

basically given 2 lists of points (of same size) want match points distance (any distance metrics fine guess, maybe euclidean , manhattan best). if possible after matching has been found, able output "similarity score". ideas?

calculate of distances between of elements in list1 , list2. order these distances. (actually these objects containing distance , references 2 endpoints) smallest distance , pair endpoints of distance. next smallest distance none of endpoints paired yet. continue until of points paired.

unfortunately algorithm has o(n2) complexity. long have compare list few hundreds (or maybe thousands) of elements work. above extremely slow...


Comments

Popular posts from this blog

python - Operations inside variables -

Generic Map Parameter java -

arrays - What causes a java.lang.ArrayIndexOutOfBoundsException and how do I prevent it? -