Unsupervised Learning & Gaussian Mixture Models

preamble

Below is an implementation of EM Algorithm in Octave:

Tips:

  • Replace the inverse function with pinv if you get machine precision errors.
  • r2 is the mean matrix for both clusters.
  • data_cluster_1_cov is the covariance matrix for first cluster whereas data_cluster_2_cov is the covariance matrix for second cluster.
  • h is the label matrix for both clusters.

EMCode

Supervised Learning & Parameter Estimation

In supervised learning, a label is normally provided along with the data point which may be n-dimensional. In other words, an observation in training data will have n features and there can be N such observations with labels provided by an expert or an oracle.

A number of simplyifying assumptions are made concerning the training data, namely, that all the samples that we are going to see would be statistically independent belonging to the same probability distribution but identically distributed.

We are all familiar with the notion of a function which maps a certain input to certain output. In a probabilistic setting, when we don’t know the function, we do function approximation which is like finding out the probability distribution P(Y|X). If X is a k valued single discrete random variable, we’ll have k different P(Y|X) to look up for depending upon the value which X takes. X can be a vector of discrete random variables as well, where there are n such X_i’s where each X_i may take up to k different values. X in this scenario is also termed feature vector. Formulaically , P(Y|X) = P(Y|X_1,X_2,X_3,…,X_n).

How to evaluate P(Y|X)?

Baye’s rule comes in very handy and we can express P(Y|X) as P(X|Y)P(Y)/P(X). Now we arrive at an interesting point, we have to make another assumption concerning P(Y) as we haven’t seen all possible data points, nor we can label all possible data points. Assume a prior, can be a beta prior, or a dirchlet prior or any other suited for the problem domain.

P(X|Y) is estimated empirically. How? By counting the number of times the values X assumes dividing total number of times Y takes on a particular value.

P(X) can be calculated using the law of total probability.

Parameter Estimation

With basics out of the way, we come to parameter estimation. In parameter estimation, we are concerned with estimating the parameters of a distribution given a certain set of data points.

As we discussed estimating P(Y|X), Y can be theta (distribution’s parameter); whereas X can be D. The problem of parameter estimation thus becomes:

1) finding either the best theta that maximises the data likelihood, or in other words makes the data most proabable:

theta = argmax_theta P(D|theta)

2) Or, finding the most probable theta given the data D and prior P(theta):

theta = argmax_theta P(theta|D)

Note: Since the current wordpress theme doesn’t support Latex, please visit http://azfar.quora.com for crisp mathematical expressions.

Thanks!

Explaining Adaptation

It is commonly observed that one doesn’t need to be a mechanical engineer to drive a car, a biochemist to prepare food, a mathematician to do monthly budgeting; yet we are able to execute afore-mentioned tasks on a routine basis, which may otherwise be considerably complex, with efficiency and skill.  Even before advances were made in sciences, we, humans, had a way of going about mundance activities.  Our means may have changed but we have always been quite capable of accomplishing our goals.

The question is how have we been able to carry out tasks that sustain our existence?  A short answer might be that we acquire expertise and skill.  Interestingly, no one is a born driver or a born cook.  An infant doesn’t know what is edible and what is inedible; he/she does not even know how to speak.  After a finite amount of time, effort and interactions with the environment, we are able to acquire skill and expertise in performing our day to day tasks.  We don’t execute these tasks perfectly even on a routine basis.  We err and try to minimise our mistakes.  Sometimes our environment changes, and the ways we used to perform our routine tasks no longer work.  We change our ways according to the new environment and start functioning.

This is called Adaptation and it’s essential to our progress & survival as an individual and as a species.

 

Of Beliefs & Expectations

How often is the case when we come across individuals in our life that are otherwise bright and capable but are underperforming? And since they are underperforming we don’t view them as bright and capable? How often do we care to pause and think what might be the reason behind so? As it turns out, most often, in addition to being not satisfied, they also lack the motivation and as a result appear disengaged.

As one might think, what can one do to light a fire under them? It helps to see what has led them to this situation in the first place? What are our beliefs about them? How are we treating them? What are their beliefs about themselves? And how do they act as a result?

As counter-intuitive it might seem, the secret to achieving satisfaction is to do something meaningful and worthwhile. Nothing makes those motivational juices flowing as setting challenging (precise) goals and achieving them.

Belief leads to action and vice versa. Do we or they “believe” that they can achieve challenging goals? There is no denying the fact that our beliefs about them influence their own beliefs through our treatmeant and actions. And having wrong beliefs lead to poor performance.

When the emphasis is on gaining something, rather than preventing something, positive expectations (yours or theirs) lead to better performance. On the contrary, when the emphasis is opposite, doubt (yours or theirs) lead to better performance. So it’s very important to guage the focus of the goal and form beliefs or expectations accordingly.

Social Media & Responsibility

Most of us in Computer Sciences & Engineering have encountered a definition of Distributed Systems as a network of inter-connected entities giving an illusion of a single entity.  Can we view social media in a similar way?  It’s no longer strange to find our feeds being filled with stuff that is relevant to what we share and post, most likely opinions and views of different individuals and organizations on issues we care about.

How should one evaluate such content?   The tendency of sharing memes is very much there; whether people do it thoughtfully or not – that’s another question. (Fake news and click-baits are a growing problem).   Over time,  people start identifying themselves with most of the “viral” stuff without questioning, as a relationship is built with the source. That’s one side of the picture.  But the very same content being presented to you has been already judged as “good” for you on basis of experience, yours or others.  Is Social Media acting merely like a facilitator? Consider this question, if you’d Google search a relevant issue and you happen to like one of the results. Why would you distrust the very same thing presented in your feed?

In the thick of it …

Most of us, particularly those with STEM backgrounds, must have encountered Venn diagrams during their course of education. Recently I saw one presenting a growing niche of intelligent software within a larger space of all software problems.  This niche is growing, no doubt.  But will it grow to an extent that it will subsume other areas as well? One corollary of this trend would be a lot of programmers, if not all, will loose their jobs.

 

screenshot_20161218-142143

There was a time when it was not advised to solve a problem in an “intelligent” way as long as the algorithm for it was known. But is that going to change? Considering the distribution of input is known, can’t the algorithm be optimized for most frequently occuring cases? Just install the black box and keep feeding input as long as you keep it on, the perpetual learning machine would keep on improving it’s performance.

Consider another trend. Not too long ago it was a rule of thumb in industry to hire a system administrator for every 100 machines.  That trend is no longer there as the job of a system administrator has long been automated.  And it’s cited as one of the benefits for shifting to cloud computing.

There is another trend of moving towards less flexible solutions as long as the needs or requirements don’t call for more custom ones.  This trend is growing in  games, mobile, cloud and the web.