Posts Tagged ‘Market Research’

Data Mining continues to aid Cyber Security

Friday, March 19th, 2010

Mr. Craig Shue, a cyber security research scientist at the Oak Ridge National Lab, said that it is clear that a large fraction of Internet address ranges at many ISPs engaged in malicious activity.He added “these [networks] may harbor malicious activity and should be investigated.”

This statement can be set as the abstract of a new research being carried out on data mining. According to this new research , by researchers from Indiana University at Bloomington and the Oak Ridge National Laboratory in Oak Ridge, TN,  tracking of organized criminal activities across the web by the cyber gangs will be much easier now.

This Research identifies dense clusters of ISPs that appear to be overly tolerant of malicious activity from anti-malware, anti-spam companies and phishing blacklists. They state that such patterns were particularly evident in Eastern Europe and the Middle East after comparing data from variety of services that Measure ISPs. Acording to them an ISP is classified as malicious if it harbored at least 2.5 percent of the malicious Internet addresses for a given data set, such as the list of phishing sites or malware-laced sites. They found 58 networks that each had more than 100,000 compromised hosts in their Internet address space ranges, while another 255 networks had between 10,000 and 100,000 systems blacklisted.

Measuring online threats largely depends on their geographic location and focus. The study includes information on phishing websites from Phishtank.com and the Anti-Phishing Working Group; botnet data from the Shadowserver Foundation; spam data from Indiana University, SpamhausSURBL, and Support Intelligence; malware hosting stats from organizations such as CleanMXeSoft, and Malware Patrol.

Ukraine, Iran and Belarus were found to be in an alarming stage as they had more than 80 percent of their Internet address ranges blacklisted for a combination of spam, phishing, and hosting malicious software. Their ISP count were two, one and one respectively. On the other hand Turkey captured the limelight while analyzing (mining) the data on prevalence of servers that criminals use to control botnets. They covered almost 9.11% of the total internet  addresses  listed through a large broadband ISP.

Another strategy, that brought United States into notice, which identifies problem networks based on the number of blacklisted addresses for a given ISP. This method usually points to the world’s largest ISPs.

One more approach was quite successful in identifying zombie systems. It was to identify ISPs and hosting providers that had a disproportionate number of network peers that were malicious. With the help of this approach 22 networks were found to be purely malicious, while some 194 networks were found to be partially malicious.

This research will definately be of great help and development of  internet security and law enforcements in this field.

For more details : http://www.csiir.ornl.gov/shue/research/infocommini10.pdf

  • Share/Bookmark

SAS: Leader in Predictive Analytics

Saturday, February 6th, 2010

SAS is the leader in business analytics software and services, and the largest independent vendor in the business intelligence market.  SAS predictive analytics and data mining solutions were evaluated by Forrester against 53 criteria in three categories through vendor surveys, product demonstration and vendor-reference interviews. SAS earned top overall ranking in all three categories — current offering, strategy and market presence — including perfect scores for functionality, professional services, licensing and cost, direction, and company financials criteria and has been named a leader among nine vendors in The Forrester Wave: Predictive Analytics and Data Mining Solutions, Q1 2010.

Today’s industry generates large volumes of data from all sectors such as  financial, retail,  factory, call centers, and customer products,  and so forth, SAS Analytics lets them realize the value within these growing volumes of data.

[Read more @ SAS]

  • Share/Bookmark

The datamining journey ahead ..

Sunday, January 3rd, 2010

The data mining journey ahead is far and wide. As we enter the 21st century, the sheer volume of data will explodes on our planet where information is being authored by billions of people and flowing from a trillion intelligent devices, sensors and instrumented objects which become a part of everyday life. About 80% of this new data is unstructured content. Data mining and machine learning are the technologies which help in capturing all this data and turn it into actual intelligence.

Data mining and machine learning, even today, is being used in various fields, be it sciences, engineering or even entertainment. Data mining in customer relationship management applications can contribute significantly to the bottom line. Rather than randomly contacting a prospect or customer through a call centre or sending mail, a company can concentrate its efforts on prospects that are predicted to have a high likelihood of responding to an offer. Machine learning is enabling more real artificial intelligence in computer games. In Bioinformatics, it is being used to detect patterns present in the DNA sequence which is very important to the human genome project.

Photography: infocusmagazine.org

Photography: infocusmagazine.org

In the future data mining will help solve problems with far more bigger impacts on mankind, with scales which we have not seen till now. The complexity of these challenges will make usage of machines necessary, not just as a source of data management but also as a source of intelligence. Data mining and machine learning can provide very innovative, out of the box and practical solutions in the fields like climate change, energy, education, health etc.  Here we look at some of the challenges of the 21st century and the way data mining and machine learning can be of some assistance.

Climate Change

Climate change is one of the biggest challenges we face and one which needs immediate attention. One of the major reasons for the current situation we are in is inappropriate usage of fossils fuel. To this end we are trying some alternative sources of energy. Data mining can play a crucial role in this by monitoring not only our usage but also figuring out patterns in global patterns which will help determine the best source of alternate energy for a region. We have been able to develop sources of energy using renewable sources like solar energy, wind based turbines as well as turbines harnessing massive source of energy present in the ocean currents. But all these sources of energy might not be appropriate for every region and we will need a detailed region wise study to gather information which will help in judging the best sources to be used during different seasons in a year for the region so as to provide a continuous flow of energy. The complexity and the huge amount of data involved in this study will make the use of computers indispensible. Data mining will enable in figuring out climate patterns and suggesting statistically appropriate results as to which is the best source of energies for a region.

Photo Courtesy: Brian Hayes, USA

Photo Courtesy: Brian Hayes, USA

Machine learning will also help us to counter some of the negative impacts of climate change like changes in the land usage patterns. Data mining, which is the partially automated search for hidden patterns in large databases, offers great potential benefits for applied Geographic Information Systems-based decision-making. Recently, the task of integrating these two technologies has become critical, especially as various public and private sector organisations possessing huge databases with thematic and geographically referenced data begin to realise the huge potential of the information hidden there. Environmental agencies are assessing the impact of changing climate conditions on land-use patterns data mining techniques on these vast sources of information. The days of Global Environmental Protection Agency(G-EPA) are here, and yes, these are the men in green!

Energy

Energy consumption leading to its depletions are on the rise. While looking out for renewable sources of energy is crucial, energy management in present time is equally important. We have large scale inefficiencies in the whole process, from energy production to energy distribution to energy consumption. Today these areas are mainly handles manually, using computers just for data management with minimal usage of computer based intelligence. But as with most of the problems, the complexity involved, will make computer based techniques like data mining crucial for an optimal solution to this problem [1].

In 2007, IBM formed a coalition of innovative utility companies to accelerate the use of smart grid technologies and move the industry forward through its most challenging transformation. The Global Intelligent Utility Network Coalition wants to change the way power is generated, distributed and used by adding digital intelligence to the current systems to reduce outages and faults, manage demand, and integrate renewable energy sources such as wind and power. Smart Grid is even being tested close to home at North Delhi Power Limited (NDPL) which is one of the biggest distributers of electrical power in Delhi, India.

Image Credit: http://my.reset.jp/~adachihayao/

Image Credit: http://my.reset.jp/~adachihayao/

Health

With the emergence of intelligent systems, in no time we will see a wide spread usage of machine intelligence in the fields of medicine and health. There is already some work going on in using data mining to identify the outburst of diseases. Public health services are searching for explanations of disease clusters by identifying common geographical, economical and social patterns existing there. This will greatly help in identifying a disease outbreak in advance, drugs management in a region and also in identifying preventive measures appropriate for a region.

Machine learning is increasingly being used in the process of generation of drugs which will me most effective on a person based on his/her DNA pattern. In the area of study on human genetics, the important goal is to understand the mapping relationship between the inter-individual variation in human DNA sequences and variability in disease susceptibility. In lay terms, it is to find out how the changes in an individual’s DNA sequence affect the risk of developing common diseases such as cancer. This is very important to help improve the diagnosis, prevention and treatment of the diseases. The data mining technique that is used to perform this task is known as multifactor dimensionality reduction [2 ].

Education

Today if a student wants to make decisions related to his/her career like which courses to take and which career path will suit his/her interests in the long run, on global scale these decisions are based most of the times on pure assumptions. We do have large sources of relevant information on the internet, but again this information is in the form of segregated and unstructured data. Such assumptions based decisions often results in a wrong choice which can impact ones career as well as his/her behavious in the long term. What we need is an intelligent systems which by using the past record of a student, educational or otherwise, to assist him/her in this decision making. This can again be done with the help of machine learning which will take records related to education as well as the psychology of the person as input to give a more logical and less assumption based decision [3 ].

Cyber Security

With the fast integration of our lives with internet, cyber security will be of utmost importance in the future. Though we have been able to take some measures with regards to this, but still there are some loop holes present which can be of grave threat to an individual.

Some very innovative work is being done in the field of data mining and machine learning to tackle the issue of cyber security. Army High Performance Computing Research Centre (AHPCRC) has developed an intrusion detection system called Minnesota Intrusion  Detection System (MINDS) which used advanced data mining techniques to detect cyber threats. Its anomaly detection system Detect novel attacks/intrusions by identifying them as deviations from “normal”, i.e. anomalous behavior. It does so by defining what is normal and anything having different characteristics as a deviation and hence as a possible threat.

Space Exploration

Space Exploration is being given more and more importance as we try to figure out the laws which govern the universe, our origin and ultimately other pools of resources in our solar system. Till the last century the main problem with space exploration was the ways by which we can gather more information. As we make advance in space technology the processing of large amount of data we gather will become more and more complicated. The number of factors one has to study in the data is enormous. It can be related to physics, chemistry and may be biology too.

Photography: matricresearch.com

Photography: matricresearch.com

Data mining provide the necessary assistance in this learning process. Data mining techniques are being used to study the elemental differences between different planets of the solar system, which is being used to deduce the chronological order of the formation of the solar system as well as the core reasons for the presence of life on planet earth.

Security and counter-terrorism

One of the issues which will be faced by many countries of the world is global terrorism. One of the ways to tackle it is have strong knowledge about suspicious identities and there association. Two plausible data mining techniques in the context of combating terrorism include “pattern mining” and “subject-based data mining”.

In the context of pattern mining as a tool to identify terrorist activity, the National Research Council provides the following definition: “Pattern-based data mining looks for patterns (including anomalous data patterns) that might be associated with terrorist activity — these patterns might be regarded as small signals in a large ocean of noise.”

“Subject-based data mining” is a data mining technique involving the search for associations between individuals in data. In the context of combating terrorism, the National Research Council provides the following definition: Subject-based data mining uses an initiating individual or other datum that is considered, based on other information, to be of high interest, and the goal is to determine what other persons or financial transactions or movements, etc., are related to that initiating datum.

Photography: wired.com

Photography: wired.com

We will continue to face bigger and more complex challenges and, we will continue to make machines more intelligent so as to be used as alternate brains or cyber brains.

References:

[1] http://www.ibm.com/smarterplanet/us/en/smart_grid/ideas/?&re=spf

[2] http://en.wikipedia.org/wiki/Data_mining

[3] http://www.educationaldatamining.org/EDM2009/uploads/proceedings/vialardi.pdf


- Kartik Rustagi, SDE Intern.

  • Share/Bookmark

That’s one small step for robot, one giant leap for robotkind

Sunday, January 3rd, 2010

International Robot Exhibition 2009 has finished with a great success, this event was held during November 25, 2009 to November 28, 2009 at Tokyo International Exhibition Center in Tokyo,Japan. International Robot Exhibition 2009 show is designed to provide a place to exhibit robots and related equipments in order to enhance market awareness of new technology. At the same time, the show is to be a medium to promote new products and to develop new business through contributing the promotion of new technology. Some of the highlights have been illustrated below.

This is a clear indication of what we can expect in the near future.(Oh boy! Not another American Robot idol, The Amazing robot race, America’s next top robot model, or robot factor)

Read more about the event here and for sure visit the compiled photo gallery1 and gallery2 for more pictures.

The robot hand is capable of 24 movements and can be remote-operated with the CyberGlove Photograph: Kim Kyung-hoon/Reuters

The robot hand is capable of 24 movements and can be remote-operated with the CyberGlove Photograph: Kim Kyung-hoon/Reuters

Humanoid industrial robot 'Motoman-SDA5D', developed by the Yaskawa Electric Corporation, demonstrates its capabilities with Lego Photograph: Dai Kurokawa/EPA

Humanoid industrial robot 'Motoman-SDA5D', developed by the Yaskawa Electric Corporation, demonstrates its capabilities with Lego Photograph: Dai Kurokawa/EPA

A humanoid robot 'Manoi AT01', produced by Japan's toy robot maker Kyosho, performs a hip-hop dance Photograph: Yoshikazu Tsuno/AFP/Getty Images

A humanoid robot 'Manoi AT01', produced by Japan's toy robot maker Kyosho, performs a hip-hop dance Photograph: Yoshikazu Tsuno/AFP/Getty Images

  • Share/Bookmark