Recently I have been learning Python, machine learning and Natural Language Processing (NLP). That’s why I didn’t post these days. In future posts, I hope I can share some interesting things in these subjects.
Yesterday I came across an algorithm that finds out all the same elements in two arrays in the Coursera NLP course provided by Stanford University. This algorithm was adopted when merging the same words in the two lists.
Imagine we have two arrays of [1, 2, 3, 4, 5, 6] and [2, 4, 6, 8, 10]. We look to find out the same elements in the two arrays which are [2, 4, 6]. Since there is a ‘Search 1D array‘ function in LabVIEW that can tell if an element is in the array. The implementation of this function is hidden and thus I’m not sure how it was done. Presumebally it iterates through the array and quits either if it finds the equavalent element or reaches the end of the array. So the iteration can take up to N times where N is the length of the array. When we want to compare two arrays with the lengths N and M respectively, the iteration could take up to N*M times.
I prepared two arrays with the lengths of 100k and 90k and used LabVIEW ‘Search 1D Array’ function to find out the same elements. See the code below:
The code iteratively checks if the element of 90k array can be found in the 100k array, and remove the element in the 100k array if found. The running time for the above code is about 7.3 seconds on my desktop. Because the arrays size is large, it could take up to 100,000 * 90,000 = 9,000,000 iterations to calculate this.
An alternative way of doing this is explained in the course. This algorithm moves both arrays simultaneously and thus the running time is much shorter. An illustration screen capture is shown below:
The two arrays compare the first elements from the beginning, and both move forwards if the two elements are equivalent. If the values do not match, move the array with the smaller value and carry on comparing with the next value. This calculation takes up to N + M (the sum of the two lengths) iterations. When the array are large, it saves a lot of time.
Following is the seudo code of the ‘merge’ algorithm.
Here is the LabVIEW implementation.
And the test VI:
This code takes 0.03 second to compute, which is more than 200 times faster!:D Although we put the ‘Sort 1D array‘ function before hand, it still beats the first algorithm. This code also works well with text arrays (you need to change the input data type of course). Try it out yourself.:)