Recently I have been learning Python, machine learning and Natural Language Processing (NLP). That’s why I didn’t post these days. In future posts, I hope I can share some interesting things in these subjects.
Yesterday I came across an algorithm that finds out all the same elements in two arrays in the Coursera NLP course provided by Stanford University. This algorithm was adopted when merging the same words in the two lists.
Imagine we have two arrays of [1, 2, 3, 4, 5, 6] and [2, 4, 6, 8, 10]. We look to find out the same elements in the two arrays which are [2, 4, 6]. Since there is a ‘Search 1D array‘ function in LabVIEW that can tell if an element is in the array. The implementation of this function is hidden and thus I’m not sure how it was done. Presumebally it iterates through the array and quits either if it finds the equavalent element or reaches the end of the array. So the iteration can take up to N times where N is the length of the array. When we want to compare two arrays with the lengths N and M respectively, the iteration could take up to N*M times.
I prepared two arrays with the lengths of 100k and 90k and used LabVIEW ‘Search 1D Array’ function to find out the same elements. See the code below:
The code iteratively checks if the element of 90k array can be found in the 100k array, and remove the element in the 100k array if found. The running time for the above code is about 7.3 seconds on my desktop. Because the arrays size is large, it could take up to 100,000 * 90,000 = 9,000,000 iterations to calculate this.
An alternative way of doing this is explained in the course. This algorithm moves both arrays simultaneously and thus the running time is much shorter. An illustration screen capture is shown below:
The two arrays compare the first elements from the beginning, and both move forwards if the two elements are equivalent. If the values do not match, move the array with the smaller value and carry on comparing with the next value. This calculation takes up to N + M (the sum of the two lengths) iterations. When the array are large, it saves a lot of time.
Following is the seudo code of the ‘merge’ algorithm.
Here is the LabVIEW implementation.
And the test VI:
This code takes 0.03 second to compute, which is more than 200 times faster!:D Although we put the ‘Sort 1D array‘ function before hand, it still beats the first algorithm. This code also works well with text arrays (you need to change the input data type of course). Try it out yourself.:)
Consider this scenario: a guy (let’s call him John) has three dices, Dice 1, 2 and 3. The shapes of the dices are different. Dice 1 has number [1, 2, 3, 4, 5, 6] on it, Dice 2 has number [1, 2, 3, 4] and Dice 3 has [1, 2, 3, 4, 5, 6, 7, 8], as seen in the following figure.
John throws one dice each time and the probability he picks the next dice is based on his previous selection. For example, he is more LIKELY to pickup Dice 1 if he just picked Dice 2 last time, and he UNLIKELY to pick Dice 1 if he just picked Dice 3 last time. We do not know which dice he selected but we can see the number shown on the dice. Now, after the observation of a sequence of throwing dices, what do you think the next number would be?
This may sound a very difficult question but actually in linguistics the researchers are dealing with this kind of problem all the time. It’s like you can HEAR the sound of each word every time and based on the HIDDEN connection rule of the words (i.e. syntax and meaning) we want to predict what the next word could be. Mathematical models were built to represent this type of question. In this example, the states are determined by its previous state(s) and we call it Markov Model, or Markov Chain. A simple case is the state is determined by its previous one state — a Markov chain of order 1. Also, which dice (state) was selected is not know and instead, the consequence of the state (number) can be observed. It is called Hidden Markov Model.
There are three problems in HMM that need be addressed. They are 1) Evaluation: Given the probability of the state transmission and the probability of the shown observations of each hidden state (I.e. for a given HMM), calculate the probability of an observed sequence. 2) Decoding: Given the HMM and the observed sequence, what is the most likely hidden states happened behind this. 3) Learning: Given the observed sequence, estimate the HMM. As we can see from this, the third problem is the most difficult one.
Hidden Markov Model (HMM) is a powerful tool for analyzing the time series signal. There is a good tutorial explaining the concept and the implementation of HMM. There are codes implementing HMM in different languages such as C, C++, C#, Python, MATLAB and Java etc. Unfortunately I failed to find one implemented in LabVIEW. This may be a reinvention of the wheel, but instead of calling the DLLs in LabVIEW, I built one purely in LabVIEW with no additional add-ons needed.
Multiple references were used to implement this LabVIEW HMM toolkit. , , , . The test demo of forward algorithm, backward algorithm and Viterbi Algorithm in the code referenced .
The following demo analyzed the hidden states of a chapter of texts. You can find the detailed description in . The following figure is the observed sequence of the HMM model. There are about 50,000 characters (including space) in this text. All punctuations were removed and only the space and letters were kept as the hidden states. Thus there are 27 states, State 0 to 26, of which State 0 = Space, State 1 = a/A, State 2 = b/B and so on.
With no prior knowledge of this text, or even English, we initialize a HHM model that has two hidden states. The probability of propagating from one state to another is unknown yet. The 26 letters are the observed phenomenons of the hidden states. The probability of each letter in State 1 is plotted in dots, and the probability of each letter is plotted in line in the 2nd state.
Running the forward-backward algorithm in HMM we obtained two states: Letters A, E, I, O, U more likely to appear in State 1 while the rest letters more likely to appear in State 2. So with no specified rules or prior knowledge we managed to divide the letters into vowels and consonants. 🙂
 http://www.52nlp.cn/hmm-learn-best-practices-one-introduction (Chinese)
I came across Genetic Algorithm (GA) the other day when I was doing the project. It is typically adopted to solve the shortest path routine problem or design and optimize the structure of proteins. It is a very smart algorithm inspired by the biological system.
I will try to describe the idea behind the concept briefly: In some problems there are many possible solutions, and we look for the best one. To find this very best solution it is like creating the chromosome of the genes in the most optimized order. To find out the best (-ish) combination, one way is comparing ALL possible combinations, which is impractical in some cases. So instead of listing all solutions and comparing them, a sub-group of solutions (population) are created, and then we pick two out of them as the parents. The better the solution was, the higher chance it can be selected. Then the two chromosomes crossover (exchange genes) to “breed” new populations. There is a chance of mutation for the new population as well. The new population are usually more “advanced” than their parent population (not necessarily better than their parents). Then new parents are picked out again to breed new populations and so on.
Typically the genes to order in the chromosome are binary, but we can also do that for integer numbers and other values. Please find this tutorial for more encoding methods.
To demonstrate the implementation of GA in LabVIEW I downloaded the coordinates of 31 cities of China and tried to find out the shortest path routine of them. So here is the .gif demo (the labels for X and Y axises should be “Latitude” and “Longitude”). Please note that this may not be the BEST solution. But in term of the number iterations we ran it, it is good enough.
Let me know your score 😉
So this is the story: Flappy Bird was so popular that my friend suggested that we should develop a LabVIEW kit with a motor to play it. Two days later, we found Sarvagya Vaish managed to score 1000 by applying Q-learning algorithm. A couple of days later, a studio used arduino to play the game. Hmm…I will finish my work anyway.
That’s where I learned about the Q-learning, one of the reinforcement learning algorithm. Here is a brief tutorial helped me to have a better understanding of it. So if a goal is achieved by multiple steps, this algorithm grades each step by assigning a reward to it. Each step, or action, is not graded right away, but one step later. In this way the “right” action can be determined by the reward it received.
The equation can be described as
Q'(s, a) = (1 – alpha)*Q(s, a) + alpha*(R(s, a) + Gamma * Max[Q(s’, all a’)])
Where Q (accumulative experience) is a table of s (state) and a (action), s’ is the next state and a’ is the next action. alpha is the step size and Gamma is the discount reward. I tried to google a Q-learning example in LabVIEW but failed. So I created this vi myself and hope it can be useful to someone.
This is a single loop vi and the shift register stores the value for Q. The reset button is to initialize Q’s value and can be replaced by “first call?” node. The user shall build their own “Reward” vi according to their applications. In this vi the next action is determined by the Q value that rewards the most but it can also be a random action (or other methods).
As I mentioned in the last post, I am now studying machine learning in my new position. Today I came across a problem to use SVM to do multiclass classification. The toolkit (link) downloaded from NI did not provide the ability to do multiclass classification with SVM but only for two classes (it’s quite a useful tool still). So I took use of the SVM VIs and made a multiclass version using one-vs-all method.
There is a good tutorial on one-vs-all or one-vs-rest classification by Andrew Ng (link). So basically we pick one class each iteration as Class A and make the rest classes as Class B. Only the test data that locate in Class A are allocated to the known class. Here is the code:
The original trained labelled data are classified as Class 0, 1, 2, … N. In the i-th iteration, only the data from Class i are re-classified to Class 1 and the rest data are re-classified to Class 0. When the test data locate in class 1 area, they are classified as Class i. Any unsorted data are left in Class -1. When I test the performance of this one-vs-all classifier, the result seems fine 🙂
The code is not optimized and the execution may cost a while.
When I tried to do parallel tasks (e.g. multiple producer/consumer loops) in LabVIEW, it was always painful to quit all loops “elegantly”. What I wanted was a notifier to tell all loops when an error occours in any loop. I know there are a few error manager VIs in the internet already but I just reinvented the wheel anyway.
In this error manager.vi there are 3 states: reset (clear errors in the shift registor), read (monitor if there is an error) and hold (stop reading when error).
So I sort of created a functional global variable here using the single loop to store the error. You can use this VI in every loop and it will quit all loops if an error happens. Here is an example:
Hope this is useful to you:)
In my projects sometime I need to write a 2D array in the same line into the text file (see the figure below).
We know that LabVIEW can write 2D arrays directly to the text file using “Array to Spreadsheet String.vi” or “Write to Spreadsheet File.vi”. But when we want to convert it to a single line it is not straight forward. We can convert the 2D array into 1D array before writing using “Reshape Array”. I don’t like this method, which is inefficient in term of space and time. Alternatively, we can write the 2D array row by row using a for loop. But LabVIEW just automatically start a new line for each iteration . The method I used is simple, but took me a while to come up with. I set the file position (“Set File Position”) each time when a row is writen, as shown below
The file position is set at the end and the offset is changed to -2 to delete the carriage symbol. A tab string is added afterwards to keep format the same. This program does the job without converting the 2D array. You can add “Transpose 2D Array” for the 2D array if needed.
Recently I had been dealing with bugs and cells. I was trying to locate the moving cells Euglena with a camera. The Euglena is a single cell that belongs both to the plans and the animals. A picture of the Euglena is shown below. The length of a single Euglena is about 50 um.
The illumination was not good due to my poor optical setup. I tried to identify the cells according to its intensity and size but neither worked well. The strategy I took at last was extracting the stable background and then compare it with the live video, so that the moving targets can be identified. The way of generating the background is averaging all the grabbed images (or, video as we call them). The changing bits are then smoothed by the number of the frames.
When averaging the images, we assume the mean of N framesimages is A_n and the (N+1)th frame is I_(n+1). Both variables are 2D arrays. Then the mean of (N+1) frames is
A_n+1 = (N * A_n + I_(n+1))/(n+1)
The code is shown below (with re-calculate/ clear function):
This SubVI can be called without external shift registers. This simple function allows us to extract the still background from the video. And thus the moving (or any changing) targets can be extracted no matter how messy the background is. The result is shown in the video below:
As I said in the description of the video, “This demo shows using an algorithm tracing an Euglena in the dish with poor (non-uniform) illumination. The mid-left and mid-right videos are raw videos from the camera. The bottom-left video is the background generated from the video in real-time. The bottom-right video is the target (Euglena) extracted from the video. The top-left video is the coords of the Euglena.” We can see that the Euglena were identified from the video even the illumination is non-uniform and the background is a bit messy.
I have been working on using a USB webcam as a research tool to observe the heart beat of a Daphina (water flea) these days. Since LabVIEW 2009 IMAQdx is available for 3rd party USB cameras, no extra USB driver is required. So I used the old but classic webcam Philips SPC 900NC to build a Daphnia observation system. I may talk more about this later.
To cut a long story short, I tried to process only the Region of Interest (ROI) of the grabbed image rather than the whole one. This can make the analysis easier. Since this USB webcam does not support ROI imaging, we cannot set XY pixels on it. I tried to google the solution and found creating a mask or, a pseudo ROI, can block the unwanted image. I programmed an example for blocking the unwanted image by selecting ROI on the image:
For some unknown reason I cannot create snippet from this VI, so I save the screenshot instead.
In this example three memories were created to store the original image, the mask and the modified image. If an ROI is selected (FALSE case), firstly convert the ROI info to a mask, and then mask the source image with the mask image. If no ROI is selected (TRUE case), the destination image is linked to the source image.
Note that:1. The property node “ROI” is created by right clicking the “image” indicator in the block diagram and selecting “Create>Property Node>ROI”.
2. The image type of the “Mask” must be Grayscale U8. The original and the changed images can be any image type.
3. It is masking the image rather than ROI imaging. So it will NOT increase the frame rate nor decrease the image size. All black regions are filled with 255 by default.
Hope this example can be helpful. Let me know if you have any problem or are interested in controlling conventional USB webcams with LabVIEW IMAQdx.
I should have written this post one month ago before I took the CLD exam. The good news is I passed that. 🙂 80 out of 100, which is more than I expected (I’ll carry on this later).
I will not talk anything about the exam itself, as I promised in the exam. But just finish the things I prepared for the exam.
1. I found this blog “Pass your CLD/CLA exams the JKI way” very useful, which gave me the confidence of carry on preparing for the exam. Though I didnt use it at the end, it is very helpful for preparing your own template.
2. Timing problem. The timing problem is the tricky part in the exam, such as how to pause/resume/restart your state machine. I made a subVI beforehand to practise it. You can find it here.
3. Pay attention to the documentation, which should be the easiest part to get the credits.
4. If you can finish the sample exams within 4 hours, I’m pretty sure you can pass the exam. If you can finish that within 3 hours, you can get the full marks in the exam.
Since I didn’t finish all the functions during the exam, I felt bad about it. But it turned out I got almost full marks in the programming style and the documentation parts, which saved my life. I didn’t have the plan to take the CLA before, but now I’m considering about it.
Good luck to all of you who are going to take the CLD exams.
Related post: Preparing for LabVIEW CLD exam (1)