The data mining journey ahead is far and wide. As we enter the 21st century, the sheer volume of data will explodes on our planet where information is being authored by billions of people and flowing from a trillion intelligent devices, sensors and instrumented objects which become a part of everyday life. About 80% of this new data is unstructured content. Data mining and machine learning are the technologies which help in capturing all this data and turn it into actual intelligence.
Data mining and machine learning, even today, is being used in various fields, be it sciences, engineering or even entertainment. Data mining in customer relationship management applications can contribute significantly to the bottom line. Rather than randomly contacting a prospect or customer through a call centre or sending mail, a company can concentrate its efforts on prospects that are predicted to have a high likelihood of responding to an offer. Machine learning is enabling more real artificial intelligence in computer games. In Bioinformatics, it is being used to detect patterns present in the DNA sequence which is very important to the human genome project.

Photography: infocusmagazine.org
In the future data mining will help solve problems with far more bigger impacts on mankind, with scales which we have not seen till now. The complexity of these challenges will make usage of machines necessary, not just as a source of data management but also as a source of intelligence. Data mining and machine learning can provide very innovative, out of the box and practical solutions in the fields like climate change, energy, education, health etc. Here we look at some of the challenges of the 21st century and the way data mining and machine learning can be of some assistance.
Climate Change
Climate change is one of the biggest challenges we face and one which needs immediate attention. One of the major reasons for the current situation we are in is inappropriate usage of fossils fuel. To this end we are trying some alternative sources of energy. Data mining can play a crucial role in this by monitoring not only our usage but also figuring out patterns in global patterns which will help determine the best source of alternate energy for a region. We have been able to develop sources of energy using renewable sources like solar energy, wind based turbines as well as turbines harnessing massive source of energy present in the ocean currents. But all these sources of energy might not be appropriate for every region and we will need a detailed region wise study to gather information which will help in judging the best sources to be used during different seasons in a year for the region so as to provide a continuous flow of energy. The complexity and the huge amount of data involved in this study will make the use of computers indispensible. Data mining will enable in figuring out climate patterns and suggesting statistically appropriate results as to which is the best source of energies for a region.

Photo Courtesy: Brian Hayes, USA
Machine learning will also help us to counter some of the negative impacts of climate change like changes in the land usage patterns. Data mining, which is the partially automated search for hidden patterns in large databases, offers great potential benefits for applied Geographic Information Systems-based decision-making. Recently, the task of integrating these two technologies has become critical, especially as various public and private sector organisations possessing huge databases with thematic and geographically referenced data begin to realise the huge potential of the information hidden there. Environmental agencies are assessing the impact of changing climate conditions on land-use patterns data mining techniques on these vast sources of information. The days of Global Environmental Protection Agency(G-EPA) are here, and yes, these are the men in green!
Energy
Energy consumption leading to its depletions are on the rise. While looking out for renewable sources of energy is crucial, energy management in present time is equally important. We have large scale inefficiencies in the whole process, from energy production to energy distribution to energy consumption. Today these areas are mainly handles manually, using computers just for data management with minimal usage of computer based intelligence. But as with most of the problems, the complexity involved, will make computer based techniques like data mining crucial for an optimal solution to this problem [1].
In 2007, IBM formed a coalition of innovative utility companies to accelerate the use of smart grid technologies and move the industry forward through its most challenging transformation. The Global Intelligent Utility Network Coalition wants to change the way power is generated, distributed and used by adding digital intelligence to the current systems to reduce outages and faults, manage demand, and integrate renewable energy sources such as wind and power. Smart Grid is even being tested close to home at North Delhi Power Limited (NDPL) which is one of the biggest distributers of electrical power in Delhi, India.

Image Credit: http://my.reset.jp/~adachihayao/
Health
With the emergence of intelligent systems, in no time we will see a wide spread usage of machine intelligence in the fields of medicine and health. There is already some work going on in using data mining to identify the outburst of diseases. Public health services are searching for explanations of disease clusters by identifying common geographical, economical and social patterns existing there. This will greatly help in identifying a disease outbreak in advance, drugs management in a region and also in identifying preventive measures appropriate for a region.
Machine learning is increasingly being used in the process of generation of drugs which will me most effective on a person based on his/her DNA pattern. In the area of study on human genetics, the important goal is to understand the mapping relationship between the inter-individual variation in human DNA sequences and variability in disease susceptibility. In lay terms, it is to find out how the changes in an individual’s DNA sequence affect the risk of developing common diseases such as cancer. This is very important to help improve the diagnosis, prevention and treatment of the diseases. The data mining technique that is used to perform this task is known as multifactor dimensionality reduction [2 ].
Education
Today if a student wants to make decisions related to his/her career like which courses to take and which career path will suit his/her interests in the long run, on global scale these decisions are based most of the times on pure assumptions. We do have large sources of relevant information on the internet, but again this information is in the form of segregated and unstructured data. Such assumptions based decisions often results in a wrong choice which can impact ones career as well as his/her behavious in the long term. What we need is an intelligent systems which by using the past record of a student, educational or otherwise, to assist him/her in this decision making. This can again be done with the help of machine learning which will take records related to education as well as the psychology of the person as input to give a more logical and less assumption based decision [3 ].
Cyber Security
With the fast integration of our lives with internet, cyber security will be of utmost importance in the future. Though we have been able to take some measures with regards to this, but still there are some loop holes present which can be of grave threat to an individual.
Some very innovative work is being done in the field of data mining and machine learning to tackle the issue of cyber security. Army High Performance Computing Research Centre (AHPCRC) has developed an intrusion detection system called – Minnesota Intrusion Detection System (MINDS) which used advanced data mining techniques to detect cyber threats. Its anomaly detection system Detect novel attacks/intrusions by identifying them as deviations from “normal”, i.e. anomalous behavior. It does so by defining what is normal and anything having different characteristics as a deviation and hence as a possible threat.
Space Exploration
Space Exploration is being given more and more importance as we try to figure out the laws which govern the universe, our origin and ultimately other pools of resources in our solar system. Till the last century the main problem with space exploration was the ways by which we can gather more information. As we make advance in space technology the processing of large amount of data we gather will become more and more complicated. The number of factors one has to study in the data is enormous. It can be related to physics, chemistry and may be biology too.

Photography: matricresearch.com
Data mining provide the necessary assistance in this learning process. Data mining techniques are being used to study the elemental differences between different planets of the solar system, which is being used to deduce the chronological order of the formation of the solar system as well as the core reasons for the presence of life on planet earth.
Security and counter-terrorism
One of the issues which will be faced by many countries of the world is global terrorism. One of the ways to tackle it is have strong knowledge about suspicious identities and there association. Two plausible data mining techniques in the context of combating terrorism include “pattern mining” and “subject-based data mining”.
In the context of pattern mining as a tool to identify terrorist activity, the National Research Council provides the following definition: “Pattern-based data mining looks for patterns (including anomalous data patterns) that might be associated with terrorist activity — these patterns might be regarded as small signals in a large ocean of noise.”
“Subject-based data mining” is a data mining technique involving the search for associations between individuals in data. In the context of combating terrorism, the National Research Council provides the following definition: “Subject-based data mining uses an initiating individual or other datum that is considered, based on other information, to be of high interest, and the goal is to determine what other persons or financial transactions or movements, etc., are related to that initiating datum.“

Photography: wired.com
We will continue to face bigger and more complex challenges and, we will continue to make machines more intelligent so as to be used as alternate brains or cyber brains.
References:
[1] http://www.ibm.com/smarterplanet/us/en/smart_grid/ideas/?&re=spf
[2] http://en.wikipedia.org/wiki/Data_mining
[3] http://www.educationaldatamining.org/EDM2009/uploads/proceedings/vialardi.pdf
- Kartik Rustagi, SDE Intern.