Forgot Password?

Weka – An Introduction


Introduction
This tutorial is a basic introduction to weka. We will present to you the most basic features of weka. Weka can be used to develop applications via its API. Do not substitute this tutorial for a complete guide to weka.

Weka features:

  • Developed at University of Waikato, New Zealand
  • Waikato Environment for Knowledge Analysis
  • Written in JAVA
  • Provides implementations of learning algorithms, like (& more)
    • Classification
    • Clustering
    • Association Rule Mining
    • Attribute Selection

Usage:

  • Apply a learning method to a dataset & analyze the result
  • Use a learned model to make predictions on new instances
  • Apply different learners to a dataset & compare results

Interfaces:

  • Explorer
  • Knowledge Flow
  • Experimenter
  • Command line Interface (CLI)

Weka can be downloaded from:   http://www.cs.waikato.ac.nz/~ml/weka/

After installing and starting up Weka, you will have the option of starting up the “Simple CLI (command-line interface)”, the “Explorer”, and the “Experimenter”. Though the Experimenter interface is a more powerful interface for manipulating, tracking, and analyzing experiments, it’s easier to run quick and dirty experiments using the Explorer interface; click “Explorer” to bring up the Weka Knowledge Explorer Interface.

Weka Explorer
The Explorer interface has a number of tabs — the ones we’re interested in are the “Preprocess”, “Classify”, and “Visualize” tabs. They are used, respectively, to load and filter data that we’re going to use in an experiment, to build and test a model for classification of our data, and to visualize (graphically view) the data

 


Figure 1: GUI Preprocess Tab



Figure 2: GUI Preprocess Tab

 


Figure 2: GUI Preprocess Tab

 


Figure 4: Filters for attribute selection in Preprocess tab.

Weka Data Format:
Weka permits the input data set to be in numerous file formats like CSV (comma separted values: *.csv), Binary Serialized Instances (*.bsi) etc. However, the most preferred and the most convenient input file format is the attribute relation file format (arff). So the first step in Weka always is taking an input file and making sure that it is in ARFF.

Typically, here is how an ARFF data set looks like:
@relation weather
@attribute outlook {sunny, overcast, rainy}
@attribute temperature real
@attribute humidity real
@attribute windy {TRUE, FALSE}
@attribute play {yes, no}
@data
sunny,85,85,FALSE,no
sunny,80,90,TRUE,no
overcast,83,86,FALSE,yes
rainy,70,96,FALSE,yes
rainy,68,80,FALSE,yes
rainy,65,70,TRUE,no
overcast,64,65,TRUE,yes
sunny,72,95,FALSE,no
sunny,69,70,FALSE,yes
rainy,75,80,FALSE,yes
sunny,75,70,TRUE,yes
overcast,72,90,TRUE,yes
overcast,81,75,FALSE,yes
rainy,71,91,TRUE,no

Figure 5: Data viewer


Covert CSV to ARFF: There is a feature in Weka which helps you convert .CSV into arff on the fly.
Following is an example CSV file
outlook,temperature,humidity,windy,play
sunny,85,85,FALSE,no
sunny,80,90,TRUE,no
overcast,83,86,FALSE,yes
rainy,70,96,FALSE,yes
rainy,68,80,FALSE,yes
rainy,65,70,TRUE,no
overcast,64,65,TRUE,yes
sunny,72,95,FALSE,no
sunny,69,70,FALSE,yes
rainy,75,80,FALSE,yes
sunny,75,70,TRUE,yes
overcast,72,90,TRUE,yes
overcast,81,75,FALSE,yes
rainy,71,91,TRUE,no

When converted to ARFF, it looks as follows:
@relation weather
@attribute outlook {sunny, overcast, rainy}
@attribute temperature real
@attribute humidity real
@attribute windy {TRUE, FALSE}
@attribute play {yes, no}
@data
sunny,85,85,FALSE,no
sunny,80,90,TRUE,no
overcast,83,86,FALSE,yes
rainy,70,96,FALSE,yes
rainy,68,80,FALSE,yes
rainy,65,70,TRUE,no
overcast,64,65,TRUE,yes
sunny,72,95,FALSE,no
sunny,69,70,FALSE,yes
rainy,75,80,FALSE,yes
sunny,75,70,TRUE,yes
overcast,72,90,TRUE,yes
overcast,81,75,FALSE,yes
rainy,71,91,TRUE,no

Classification
Now we’ll take a look at the Classify tab. First, take note of the drop down selection that should
currently contain “(Nom) class” for the iris dataset. This allows us to select the attribute to use as
a classification target — i.e., the class to predict. Since we’ll be using the various flower metrics to
predict the “class” of iris, we can leave the current selection as it is. In the case of the arabicwords
relation, we’d want to use entity-type as the class.
Clicking on the box labeled “Classifier” allows us to select a model to use for classification — select
the weka.classifiers.DecisionTable classifier. Each classifier can potentially have properties that either
tune its operation or modify output — modify the displayRules property for the DecisionTable
classifer so that it is “True”. Click OK to go back to the Classify tab. Note that the “Classifier”
box now contains the classifier name and a set of arguments that correspond to the properties
selected. Click the “Start” button to run the classifier, leaving other options as default.

Figure 6: Classification using OneR


Figure 7: Classification using decision tree.


Figure 8: Model evaluation using cross validation.

The classifier checks values of all attributes for input records and compares those values
with the values predicted by the preprocess filter using its algorithm. After the
classification is complete Weka outputs the statistics. For example: Percentage of
correctly classified Instances, Incorrectly Classified Instances etc. The Classifier output
can also be viewed in the form of a tree which may give a clearer idea of the classifier
model and algorithm.

Figure 9: Model evaluation using contingency matrix.

The classification task we’re ultimately interested in is one of being able to take a number of
“features” (aka “attributes”) extracted from  text (e.g., word length, suffix/prefix presence,
part of speech, etc.) and using them to build a model to predict the entity type (class) of a given
word.

 

Visualization



Figure 10: Visualize tab Features:

Scatter Plots:  A scatter plot reveals relationships or association between two variables
 How scatter plot helps:
It helps answer the questions regarding the involved variables X & Y such as

 

Conclusion
Weka helps in realizing goal of Data Mining, in this case by predicting missing values and validating that the predicted values are indeed correct. Weka Algorithms can be used via its API to build custom tools, applications and algorithms as well. The article here serves as introduction only and does not serve as an comprehensive guide. The references used to compile this article are listed below.

References
[1] Witten, Ian H. and Eibe Frank. Data Mining: Practical Machine Learning Tools and Techniques
with Java Implementations. Morgan Kaufmann, 1999.
[2] Han, Jiawei and Micheline Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann,
2000.
[3] http://www.cs.waikato.ac.nz/ml/weka/