A picture paints a thousand words …

A picture paints a thousand words. And words are painted across our brains.  In a paper titled “Natural speech reveals the semantic maps that tile human cerebral cortex” by Huth et al, and published in Nature in April 2016, authors have mapped words to different regions of brain using functional MRI data while subjects listened to hours of narrative stories. Interestingly, despite having our individual maps, our minds are organized in similar and consistent manner, and words cluster as per semantic domains. Here is the amazing video:

Nature video on how brain maps words to different regions

Here is a screenshot of part of brain and words mapping to that region:

Screen Shot 2016-05-01 at 11.56.09 AM.png

(Figure Credit: Nature video on brain dictionary, April 2016)

One cannot help observing connection with current research in Natural Language Processing (NLP) in the field of Artificial Intelligence. Machine learning models such as deep recurrent neural networks can work with words. But since computer models work with numbers, words have to be first converted into numeric representation in the form of vectors. Here dimensionality of these vectors can be large. In a way, the words are being converted into spatial points in a high-dimensional spaces and then semantics becomes spatial concept. Words which are semantically similar, map to close by regions, and their relative displacements capture semantic concepts. See the following reference for technical details of word2vec (word to vector) approach:

Vector representation of words

Screen Shot 2016-05-01 at 10.16.50 AM.png

(Figure Credit: Tensorflow tutorial on word2vec)

It seems we are making progress in unraveling how mind works.

At the same time, a lot has yet to be discovered and understood.

How does a new born baby develop this semantic map within a matter of few years, which seems to be consistent across individuals?

And, where are our thoughts in all this?

In a semantic world, where words become colorful entities in space, perhaps our thoughts are nothing but mysterious dances in this surreal landscape.

And, where are our dreams in all this?

One can only wonder, as a wise philosopher did long time ago:

“Once upon a time, I dreamt I was a butterfly, fluttering hither and thither, to all intents and purposes a butterfly. I was conscious only of my happiness as a butterfly, unaware that I was myself. Soon I awaked, and there I was, veritably myself again. Now I do not know whether I was then a man dreaming I was a butterfly, or whether I am now a butterfly, dreaming I am a man.” – Zhuangzi, 4th century BC

dali boat.jpg


Deep Learning brainstorming at a Lucknow (India) school

Screen Shot 2016-01-23 at 6.01.51 AM

I recently got opportunity to give talks at two very different places in India. One is the Indian Institute of Technology (IIT), Kanpur, one of elite technical institutes in India with stellar alumni. I spoke on Computer Vision, Machine Learning and Deep Learning, highlighting how my own career has been intertwined with progress in these fields. As expected, audience was highly learned and technically strong. I have now signed up with IIT Kanpur’s Machine Learning Special Interest Group and am happy to see the group is very active and versed with the latest developments.

This post is however about my experience at a very different place. I was invited to talk at St. Anjani’s Public School in suburb of Lucknow, Uttar Pradesh, India. Students at St. Anjani’s School come from modest backgrounds in this North Indian belt where people are still struggling hard to make ends meet.

When I arrived on the campus, I was greeted by the school manager and the school principal and they took me to one of their class rooms where I spoke on Machine Learning, Deep Learning and Data Science in an interactive format assisted by a power-point presentation. It was a great experience for me as the audience consisted of 10th and 11th grade students.

I was highly impressed by the questions asked by students. How do Internet Search Engines work? What are the conditions in which Artificial Intelligence cannot be used? What steps can students take to learn more about Data Science, Deep Learning and AI? How can one succeed in entrepreneurship in the IT sector?

Students came up with ideas for applications where data science and machine learning can be used which were at par with those being considered and funded in Silicon Valley. Here were some of children’s suggestions: smart home, ensuring safety of children using robotic babysitters, applications of deep learning in health care, smart governance, remote medicine, and educational apps.

I came back with the following observations:

  • There are smart children everywhere, including the poorest areas of the world. And, all these children harbor in their hearts desire to learn about the cutting edge in sciences and technology.
  • Success depends on opportunities. Not everyone gets opportunities and resources to succeed.

Next day in Indian newspapers, I read articles about the latest trends in technology. As the technology juggernaut of data science, deep learning and AI marches forward in Silicon Valleys of the world, I paused to ask if deserving children around the world will all have opportunities and resources to participate in this effort or if they will get held back because of unfortunate circumstances not in their control. I am very much part of Silicon Valleys of the world, and it is my hope that some of the aspiring students I met will make informed decisions in future when they have to choose their college majors and careers. That also made me reflect on corporate social responsibility programs and how I can be a part of such an initiative to ensure scholarships for these deserving children who will design our future world.

Ultra Advanced Autonomous Artificial Intelligence


Computer Vision, Speech Recognition and Natural Language Processing are experiencing convergent and synergistic progress these days. Breakthroughs are being reported in mass media. Large neural networks with many layers and deep learning are active areas of research at present.

I believe this is just the beginning and coming decades and centuries will witness great progress. And that brings me to the topic of this post.

What if we can eventually build ultra advanced autonomous machines? Should we do it?

Recently Stephen Hawking warned us against the dangers of Artificial Intelligence. To not have a human in the loop could lead to unpredictable scenarios. As any computer programmer knows even a simple program has to face all kinds of situations when deployed in the real world, so if it is a highly complex and completely autonomous system, then it will be very hard to foresee all possibilities in advance.

To be on the safe side, we may choose not to build fully autonomous systems and always have human in the loop. At least if they are to be used on the Earth. This may make them less efficient but we may choose to live with this decision. But what if we need to move out into space. Let’s say we want to terraform Mars. Then it would be perfectly reasonable to first send robots there. And they will have to be autonomous with no humans around.

Oh, on a lighter note, in Hollywood, it is fashionable to suggest that such machines can even become conscious and have feelings. Unless we first understand what states of matter can cause consciousness to emerge, it really is rather pleasant and entertaining fiction at this time, which I do enjoy when I go to see such movies with my children. In this post, I am talking about highly complex state transition of such machines from rather geeky software engineering point of view. Of course, one day it will be very exciting to learn about scientific breakthroughs in understanding human consciousness which undoubtedly will have far reaching implications.

Since I am in favor of technological progress and I see great practical benefits of smart software – in medicine, space exploration, scientific research, energy, education, commerce and many other fields, I think it is really for us to become responsible in use of technology as a society and have appropriate rules in place.

Technology just gives us tools. How we use them, it is for all of us to decide together. 🙂


Machine Learning, Deep Learning and Scientific Understanding

Machine learning is super hot in silicon valley these days! It has emerged as a very useful discipline in computer science and statistics. If you have skills in Machine Learning, you are likely to get a nice job. But what exactly is machine learning? With growing popularity of the field, engineers and scientists know the technical answer, but can it be explained to everyone in simple language? What are current trends in machine learning and what could come next?

In this article, we will focus on statistical learning and discuss state-of-the-art, trends and future directions.

Decisions, Decisions, Decisions!

Why is machine learning hot? Well, there are so many decisions to be made everywhere. Suppose you want to predict, recommend, classify or rank something in an automated data driven manner. Then your best bet is to use statistical machine learning.

Netflix wants to recommend movies to you which you may like. Google wants to rank web pages depending on their relevance to your search query. Facebook and LinkedIn want to display those advertisements which are likely to be clicked. A biotechnology company wants to offer a diagnostic platform to predict a medical condition depending on gene and protein expression. When you wave your hand in front of Kinect, it tries to classify its 3-D depth sensor data into different body parts and understand your gesture. A retailer wants to predict demand of inventory items. Self-driving cars want to understand immediate surrounding traffic. And the list goes on.

In future, a household robot will recognize faces, understand gestures, facial expressions and speech, and it will be able to move around your house helping out in chores. It will have to make a lot of decisions almost non-stop! Machine learning will be the basis for such Artificial Intelligence.

It is not easy to come up with rules of thumb to make decisions for all these problems. There could be very large number of situations to handle and you may not be able to write down a simple recipe which will allow you to make decisions in all possible cases with desired accuracy. There could be a complex underlying process going on which may be quite difficult to capture in a simple handcrafted model.

Machine learning is building software which allows you to make decisions in a statistical sense. You train this software based on training data, which is the data where humans provide their judgement or labels and which are presumed to be correct decisions. Therefore, it captures human intelligence and decision making experience. At a very concrete level, training data can be viewed as an excel table. You have rows and columns. Rows are all different instances or samples of your problem. Columns represent features or signals which you think could be useful to you. There is an extra column which corresponds to the correct decision filled out by human experts.

Example: Predicting Happiness

To make the discussion livelier, let us apply machine learning to a problem which is subject of so many self-improvement books. We will try to achieve through machine learning, by just using statistics, something which has preoccupied humanity for ages: we will predict happiness!

Happiness is our target or response variable. Using our intuition and on consultation with happiness experts, we make a list of all those factors which we presume to be important in predicting happiness. They could be age, gender, income, relationship status, number of children, political beliefs, religious beliefs, job satisfaction, number of friends, personality type, and so on, which will constitute our features, signals or predictor variables. Let us say we make a list of 20 such features. We then go around the world taking a survey of say 5000 people. For each person, we get the values of 20 features. The training data as an excel table will have 5000 rows and 21 columns. Why 21 columns? First 20 columns are the features. The last 21st column is the target variable which indicates how happy the person is, for example, on a scale from 0 to 10. How to fill out this 21st column in our happiness project? Well, here human judgement, experience and intelligence come into play. It could be self-reported happiness level or it could be something which is computed by certain experts whose task is to assign each person a happiness value. An important point is that this value will be assumed to be correct. That is why sometimes training data is called Golden dataset or Ground Truth and this process of using labeled data is called supervised learning.

Once we have our 5000 rows, 21 columns excel table filled out, we can then use it as an input to a machine learning training algorithm which will try to build a mathematical function or model that will map predictor variables into the output variable. The output of this supervised learning is the trained model which we will then use in practice to predict happiness of any person.

What is the form of this model which we trained? Machine learning literature gives you many choices, from very simple ones to quite complex ones. Let us say we use them as black boxes and simply try all of them out using a brute force approach and see which one works best.

The training algorithm will also tell us how well we succeeded in training. Suppose it tells us that we achieved high accuracy. Can we then open Champagne bottles and start celebrating on solving this age old problem? No. What we will need to do is to then apply our model on the data which we have not seen. This is called test data and we are shown this data only once after the training has been completed. This is our real examination. If we do really well in this test set and assuming this test set is fairly big and representative, then yes, we can definitely celebrate! 🙂 And it will be another jewel in the crown of machine learning.

Machine Learning Concepts

While we are trying to predict happiness we may read few books on machine learning. In machine learning literature, we would encounter concepts such training error, generalization error, bias-variance tradeoff, VC Dimension, PAC learning, overfitting, underfitting, ROC curves; we will become familiar with models such as linear models, logistic regression, neural networks, support vector machines, bayes classifier, decision trees, nearest neighbors, probabilistic graphical models, generative and discriminative models; we will learn about gradient descent, convex optimization, clustering, expectation maximization, boosting, bagging, bootstrapping, monte carlo techniques, cross validation, dimensionality reduction, regularization; and many other things. Knowledge of linear algebra and statistics will be quite handy to master these concepts.

All these will give us theoretical understanding of what is going on in machine learning as well a set of practical tools. We will discover that machine learning is an empirical science based on rich theoretical foundation which requires a lot of experimentation and iterative continuing improvement cycles. We will realize that we need rich visualization tools which help us discover patterns in our data and in our results so that we can make these continuing improvements.

Deep Learning

In our example of happiness prediction, we used our intuition about and our understanding of happiness to list 20 features which we thought were important in predicting happiness. Depending on the application, we will come up with an appropriate list of features which we think are important. For example, for Google, PageRank is an important and well-known feature to rank web documents. If we are trying to classify a digital image, we will use certain computer vision features, for example, those based on image intensity gradients. This is called Feature Engineering. Lot of innovation is required in designing such useful features and therefore those publications which propose such features get well cited.

Let us now stop briefly and make two observations. One crucial observation is that we live in a world which is best modeled in a hierarchical manner, from elementary particles to giant galaxies, from raw pixels in your digital camera image to a familiar face, from vibrations of air molecules to a Beethoven’s symphony, from simple realities within us and around us to the elusive concept of happiness.

Another crucial observation is that human mind, which is our best model for intelligence, does not exactly go through the process of supervised learning. A human baby looks around the world and figures out a lot of patterns in an unsupervised manner. Parents and teachers only provide a gentle supervised touch to this innate learning process. We follow process of building of models of the world based on evidence and gradual refinement.

How exactly unsupervised learning and supervised learning will both work together to give the best solution is still being researched. It is worth noting that supervised approaches continue to perform well and when we do have large amount of labeled data and we use layers of hierarchical features which are learned automatically then supervised learning offer competitive solutions and outperform unsupervised learning. At the time of writing of this article, convolutional neural networks which are trained using supervised approach seem to be performing the best for computer vision.

Can we somehow capture these ideas and improve classical supervised machine learning? It could be also very useful as this is the era of Big Data. Enormous amount of web data, mobile data, image data, video data, social networking data, customer data, biological and medical data, is being collected in giant data server farms and it is practically impossible to label this humongous data as in classical supervised machine learning.

An unsupervised or semi-supervised approach which works with unlabeled data seems to be our better bet from a practical point of view. We can try to build hierarchical representations of features automatically using unlabeled data just like human baby does, through an iterative process of model building, model matching and model refinement, some components of which could be based on supervised learning. Human brain is estimated to be a hierarchical deep network of neurons with many layers. Starting with raw data, from our eyes through the optic nerve to visual cortex or from our ears through the auditory nerve to auditory cortex, it builds a hierarchical representation of features, which ultimately leads to recognition of a familiar face or to appreciation of a beautiful song.

Deep learning derives its motivation from this biology. Deep learning is a technique currently being researched by machine learning community in which we train hierarchical features in an unsupervised manner using huge amounts of unlabeled data and which are then fine tuned further in classical supervised manner using much smaller amount of labeled data. We are therefore automating the process of feature engineering which earlier used to require human ingenuity.

Consider the example of classifying an image as that of a cat. One can feed lots of images, both of cats and non-cats, perhaps using millions of youtube videos (as was recently demonstrated by a machine learning team led by Prof. Andrew Ng) to deep learning algorithm and let it combine raw pixels into features such as edges, and then next level of features such as composite edges, corners and basic textures, and then next level of features such as eyes, ears, fur, etc., till we get a high level feature which puts them all together into a cat’s face.

Coming back to human baby example, it seems 3 billion years evolution have trained us with first layers of these robust vision features, and then aided by amazing flexibility of human brain, a baby has no trouble in training higher layers to recognize cars, table lamps, people, butterflies, flowers, etc. So evolution of mind and flexibility of mind go hand-in-hand in creating human intelligence! Parents, teachers and other people are still important as they train the highest layer which gives us social and emotional intelligence.

Artificial Intelligence and Natural Intelligence

Since deep learning has strong biological motivations, are we then moving towards a future in which the line between natural intelligence and artificial intelligence is blurring? It is exciting to see that computer scientists and neuroscientists can work together now to unravel mysteries of human mind!

At the same time, we should realize that machine learning which includes deep learning is best explained in terms of mathematics. We are really training a mathematical model which may or may not correspond to how human mind works but still it will have all the appearance of intelligence in functional form. Machine learning tries to replicate mapping of features to correct predictions using whatever works best in practice. Therefore, such software appears intelligent when viewed as a black box, but it could be employing a totally different mechanism to perform this mapping than what we employ in our brain.

Which one is higher intelligence, artificial or human? Which one has better long-term potential?

Since human mind is only one of the ways to do this mapping, despite our anthropocentric mindset, there exist pure mathematical models out there which significantly outperform human mind and which hopefully will be discovered by us at some point. When coupled with the fact that cloud computing of future will be able to train and execute these models at a mind boggling scale and speed, it is reasonable to predict that artificial intelligence will eventually surpass human intelligence! It should make us humble as well as proud.

Scientific Understanding

That all sounds very exciting, useful and practical. In era of Big Data computing, machine learning which includes deep learning is a great tool. But is it just useful for businesses? Aren’t we truth seekers and not just utilitarian? Does it lead to better understanding of life and universe?

Let us say we did a great job in predicting happiness using statistical machine learning. Does this model tell us some new truths about happiness? How do these features really affect happiness? What are underlying processes and cause-effect relationships? Did we merely capture statistical correlations and nothing more? Where are psychological truths? Happiness is a difficult topic and it involves social and political realities. Does our model teach us how we can create better societies which enhance happiness?

Taking a more concrete and down-to-earth example, suppose we used genomic and proteomic data to predict a disease using a machine learning model such as neural network. Even if it achieves high accuracy, does it tell us anything useful about underlying biological pathways? Understanding how genes and proteins interact with each other and under influence of epigenetic environmental factors is as difficult as unravelling paradoxes of happiness.

Machine learning used as a black box therefore seems to be just a statistical tool devoid of truths, at least at first glance.

Good news is that we can make it as a tool for scientific understanding. This is one exciting area where this field can grow and become mature in both applied and theoretical sense! How can we interpret machine learning models? What insights can it give us about the problem at hand? How can it assist truth seekers and at the same time give something useful to utilitarian?

We want to better interpret the machine learning models we build. This desire to interpret machine learning models need not be just a goal to help science but it can be rooted in pragmatic goals of business. A business may not want to see unexpected blunders and errors by its trained machine learning model. It may opt for a machine learning model which is simpler and therefore amenable to human interpretation in order to avoid such errors. Once number of features becomes large and we start employing very complex machine learning models, we lose understanding and therefore control, which can cause uneasiness among business leaders.

Interesting work in future can be done in exploring such high-dimensional feature spaces, complexity of models and their simplification, feature interactions and underlying dynamics of processes, and unexpected errors.

Therefore, we should resist our immediate temptation to call machine learning a statistical tool for practical business goals in contrast with science which tries to understand reality.

We should also remember that our scientific laws are also mathematical models. Newton’s law of gravitation was a mathematical equation till it got superseded by Einstein’s space-time curvature. Though it is an amusing story, for Newton the training data consisted of an apple which fell on his head, but it was enough for building of his great model that worked well for both apple and the moon. 🙂 Quantum Mechanics is a mathematical model. We are struggling to interpret Quantum Mechanics. It is just a useful tool or is it truthful depiction of reality? We believe in these mathematical theories in statistical sense and that may not be too far from machine learning. We do experiments, which is like our test set, and we confirm the predictions of these models. This is the scientific method. But for Truth, our bar is highest possible and we firmly demand 100% accuracy. Even a single violation sends us back to search for better theory. Newtonian mechanics is approximate and quite useful in practice but not exact. Einstein has provided us with better theory. And the search goes on.

Collaboration in Machine Learning

One effective way we will be able to make progress in machine learning will be through collaboration. One of the bottlenecks in machine learning is training time. As we bridge gap between artificial intelligence and human intelligence, we will have to replicate what three billion years of evolution did for natural intelligence. More complex models are built on top of simple models.

To aid in this incremental and hierarchical improvement, successful models which we build in academia and open research labs can be released in public domain. There could be an open-source initiative in which we keep a repository of machine learning models. Like evolution of natural intelligence in nature, artificial intelligence will evolve with time and through social community effort. We should be able to reuse, modify and enhance artificial intelligence models built by others. Of course, it will need some effort on the side of data standardization, pre-processing and post processing, but this should be a solvable technical problem.

That concludes this article. It is hoped that scientific machine learning, which aims to understand, and social machine learning, which aims to replicate evolution, would lead us to new exciting frontiers in coming decades.