Archive for November, 2009

Adaptive websites -making the Internet friendly

Wednesday, November 25th, 2009

Adaptive Websites – Making the Internet a friendlier place.

There are two different kinds of web pages:

  1. Static.
  2. Dynamic.

Static web pages contain the same pre built content each time the page is loaded.They are standard HTML pages with HTML codes that control the page display.The page looks the same every time it is loaded.

Dynamic web-pages on the other hand are built on the fly to match the users request or based on user interaction.Generally they may contain PHP,ASP or JSP code that are server side code which enable the server to generate different content each time they are loaded.

Even though considering the flexibility of Dynamic web pages ,  there is still scope for development of the interaction experience of the user,to improve the relevance of the content displayed etc.

There are websites as iGoogle,MyYahoo,MyExite,MyCNN that allow users to customize the website according to their like. But even in this case users have to do the customization themselves by explicitly itemizing each item for which the user must have understanding of the website itself,and it know the different offering site like MyYahoo have could be a daunting task and will require the user to browse the site extensively.

So we look into this new concept called adaptive websites (the research was ongoing since the last decade) :the websites that automatically improve their presentation,organization( of pages) and content based on the users access patterns that are mined from the web server logs or other data sources.

For example ,let us consider a person visiting a news website on a regular basis. Every time he visits the website, he only visits the links related to computers,technology , science and technology,electronics,computers , literature and sports,  rather than business,economics and politics ( but still seldom reads such  articles  ).So based on his past reading activities,the website may his access detect patterns and know about his interest,and based on these results structure the website so as to include more links about technology or computers rather than politics or  finance ,then the visit for the person shall be more useful.

So, adaptive website analyze user access pattern and mine them to find different groups of people sharing same interests and display the website to such users by improving  the web-site’s design,organization  accordingly then the users will find the website more useful than the latter case. Building adaptive websites is still a concept . Data mining,machine learning,collaborative filtering etc are the techniques closely related in making Adaptive Website a reality.

First ,based on the user access data and using data mining tools we must detect groups(clusters) of users with similar access patterns and form clusters. We can either form flat clusters or Hierarchical Clusters.

Once the Clustering is built,we then look at transformation.The process of transformation deals with creating patterns for each cluster by searching for the different steps required to design the from the  present initial page to the goal page  using a scoring function and applying the information gained employing different mining techniques on the access histories of users(generally Web Server Logs) to complete the transformation (similar to the process of planning in which the path from initial to goal is deduced in steps).

These transformations re stored on the web server ,and when a user visits the website again ,the web server will detect him and apply the transformation  to the web pages to provide the user with an environment designed to his/her likes and habits , thus making  the visit more fruitful and eventful and the user being satisfied.

Hence ,if and when the concept of Adaptive websites becomes a reality ,the Internet that is ever growing and becoming a bigger part of our life day-by-day , will become a more pleasurable,informative and relevant experience for each one of us.

Read more here.

– Venkatesh M. [Student Intern]

  • Share/Bookmark

Links Compilation #3

Sunday, November 15th, 2009

Premier data mining journals:

  • IEEE Transactions on Knowledge and Data Engineering: IEEE Transactions on Knowledge and Data Engineering (TKDE) is an archival journal published monthly designed to inform researchers, developers, managers, strategic planners, users, and others interested in state-of-the-art and state-of-the-practice activities in the knowledge and data engineering area.
  • IEEE Transactions on Pattern Analysis and machine intelligence :IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) is a scholarly archival journal published monthly. This journal covers traditional areas of computer vision and image understanding, all traditional areas of pattern analysis and recognition, and selected areas of machine intelligence.
  • Knowledge and Information Systems
  • Data Mining and Knowledge Discovery: The premier technical publication in the field, Data Mining and Knowledge Discovery is a resource collecting relevant common methods and techniques and a forum for unifying the diverse constituent research communities. The journal publishes original technical papers in both the research and practice of data mining and knowledge discovery, surveys and tutorials of important areas and techniques, and detailed descriptions of significant applications.

Industry publications:

Read older Link Compilation #2

  • Share/Bookmark

Expressive thoughts when mined express a lot more: Welcome to Sentiment Mining

Wednesday, November 11th, 2009

The revolution in the internet has caused a huge increase in the number of Social Networking sites, blogs, forums and various other forms of online expressions of reviews on products, latest trends, question and answers, recommendations. All this data relates to the personal interests, opinions and feelings of the online community This has created a mammoth amount of data that can be used for understanding personal interests, likes and dislikes of people, moods, behavioral patterns, user satisfaction of a certain service or product, how certain events affect their activity, moods of people.

This data thus opens door for a new type of data mining- SENTIMENT ANALYSIS. This means mining the data available from various social online media like micro blogging sites as Twitter, blogs and forums,rss and atom feeds, sites like mashable, social networking sites as Facebook, MySpace can produce information that could be extremely useful in areas like targeted product marketing, finding how a new product is faring in the market, how certain actions by corporations may affect their popularity or cause rise or fall in their stock prices, what brands are become trendsetters, which  products from competitors are surviving the markets and ways to get an edge over them etc.

Many new companies have already taken a front foot in this lucrative field of data mining with a goal of developing applications and services, using which the company management can better understand what is influencing their customers, competitors and the state of the present market by monitoring and analyzing (mining) the enormous amount of information that originates from the growing number of information sources.

Scout labs ,a Sentiment Analysis company provides a powerful, web-based application that tracks social media and finds signals in the noise to help its clients create better products and stronger customer relationships to their numerous clients as HP, Deloitte, SonyBMG and others with product as Sentments,Trendspotter etc .

Jodango, as claimed “The world’s first Opinion Utility” provides its clients analytics through context matching, extracting patterns from the client and other sites using their “Article Focused Approach” that uses articles as a source of data.

Nielsen ,the giant providing TV rating also provides web metrics ,over took Buzz Metrics to provide sentiment analysis and has proved its worth between its competitors.

Techrigy, now acquired by Alterian provides sentiment analyses through their  proprietary technology SM2 focused on marketing professions and returns results charting the comparisons, demographics, geo-location, sentiment and drill-down reports.

Symosos provides two flagship products MAP (Media analytics Platform) and Heartbeat that provides real-time snapshots of online feed/content in a GUI model.

TNS Cymfony, uses its flagship product Maestro Platform that crawls the web for forms, blogs and categorizes, it and displays it in a user friendly graphic manner.

A list of various other companies providing Sentiment analysis is given below:

Lexanalytics provide their text analytics software Salience4 that provides sentiment analysis by methods of Entity Extraction, Entity Relationships, Document Summarization, and Sentiment or Tone extraction.

Biz360 has designed a proprietary technology, that provides  analysis using natural language processing (NLP) to aggregate to  analyze vast amounts of media and market information that provide with information useful to better understand the customers, reach out to them and succeed.

Evri,a  semantic search engine has released a new web API that they have claimed understand the way the web feels. Developers can use this API to develop tools are services focused on market study, sports and entertainment, brand management etc.

Various tools are available in order for the individual users to sentiment analyze.

Sites like Tweetfeel (searched tweets using a sentiment analyzer), Twitrrat (that’s provides a positive or negative feeling about any product or topics by mining tweets) , Twendz (it highlights conversation themes and sentiment of the tweets that talk about topics you are interested in and it has the ability to dynamically alter as conversations keep changing by evaluating about 70 tweets at a time)

Backtype is a conversational search engine that crawls the web for data and index to serve in real time what people all around the world are speaking about a certain topic of interest.

Newsswift, a tool from The Financial Times, provides search results by matching relevant results, identifying relationships and deducing web sentiments (positive or negative).

StumbleUpon provides add-ons on various website to collect information that people feel is interesting and wish t o refer other to read. It users a “Recommendation Engine” that takes various user rating, calculates a rating and presents it.

Mashable, the blog that focuses on social media news and keeps reviewing new Web sites and services, posts latest buzz and  news on what is new on the web and seems to be popular among Bloggers ,Twitter and Facebook users.

Various other services, products and companies continue to rise in lure of providing insights in the social online life by mining their activities, opinion and thus developing a deeper understanding about the masses and potential customers and hence providing a better knowledge source to develop business .

So finally the entire world cares for your OPINIONS and FEELINGS, so you can feel proud to speak yourself to the world – and social network and enrich the online community. Feel all you want and the world hall know about it. Hope a SENTIMENT ANALYSIS on this article returns a POSITIVE result. Keep mining.

-Venkatesh, Student Intern.

  • Share/Bookmark

Web scraping, spamming bot buster is here

Thursday, November 5th, 2009

Bot’s are out there, continuously evolving, becoming smarter day by day. Learning to be more human like.  They’re finding faster ways to hack your system, steal your money, and bring down your network. They are spammers, scammers, bots, and fraudsters. But the race to fight back has already began.

Pramana’s main product, HumanPresent, detects automated bots that, for example, enter spam into Web-based forms or register for free e-mail accounts to be used for spam. You can request an trial here.

HumanPresent can detect bots by noticing differences in the way a human would normally interact with a Web page and contrasting that with how bots behave. It looks at more than 30 metrics, such as keyboard strokes, mouse clicks and the timing of those actions.

Pramana has now developed a module called “data mining and screen scraping prevention” for HumanPresent. It works on many of the same principles as its main product but has been modified for data-mining scenarios, said David Crowder, Pramana’s CEO.

Finally we are working towards putting a check to the bot problem. But would such technology lead to be a bridge, by taking bot based scraping, mining and spamming technology to its next stage, making them more smarter, and more human like. The answer is to wait and watch.

Read more at Pramana

HPServer2

CorEng2

About Pramana:

After years of extensive research in the field of botnet behavior, a small group of individuals from Georgia Insititute of Technology’s  College of Computing founded Pramana in 2007.  They rapidly launched the company to meet the growing market need to combat hackers and today Pramana’s HumanPresent™ solution is deployed in several large Internet Service Provider and Fortune 1000 environments. Since its inception, Pramana has gained acclaim and accolades from customers, media and research analysts by offering a more innovative, elegant, user-friendly alternative to the standard CAPTCHA solution.

  • Share/Bookmark