There are four kinds of tasks that are normally involve in Data mining:
* Classification - the task of generalizing familiar structure to employ to new data
* Clustering - the task of finding groups and structures in the data that are in some way or another the same, without using noted structures in the data.
* Association rule learning - Looks for relationships between variables.
* Regression - Aims to find a function that models the data with the slightest error.
For those of you who are looking for some data mining tools, here are five of the best open-source data mining software that you could get for free:
Orange
 Orange  is a component-based data mining and machine learning software suite  that features friendly yet powerful, fast and versatile visual  programming front-end for explorative data analysis and visualization,  and Python bindings and libraries for scripting. It contains complete  set of components for data preprocessing, feature scoring and filtering,  modeling, model evaluation, and exploration techniques. It is written  in C++ and Python, and its graphical user interface is based on  cross-platform Qt framework.
Orange  is a component-based data mining and machine learning software suite  that features friendly yet powerful, fast and versatile visual  programming front-end for explorative data analysis and visualization,  and Python bindings and libraries for scripting. It contains complete  set of components for data preprocessing, feature scoring and filtering,  modeling, model evaluation, and exploration techniques. It is written  in C++ and Python, and its graphical user interface is based on  cross-platform Qt framework.RapidMiner
 RapidMiner,  formerly called YALE (Yet Another Learning Environment), is an  environment for machine learning and data mining experiments that is  utilized for both research and real-world data mining tasks. It enables  experiments to be made up of a huge number of arbitrarily nestable  operators, which are detailed in XML files and are made with the  graphical user interface of RapidMiner. RapidMiner provides more than  500 operators for all main machine learning procedures, and it also  combines learning schemes and attribute evaluators of the Weka learning  environment. It is available as a stand-alone tool for data analysis and  as a data-mining engine that can be integrated into your own products.
RapidMiner,  formerly called YALE (Yet Another Learning Environment), is an  environment for machine learning and data mining experiments that is  utilized for both research and real-world data mining tasks. It enables  experiments to be made up of a huge number of arbitrarily nestable  operators, which are detailed in XML files and are made with the  graphical user interface of RapidMiner. RapidMiner provides more than  500 operators for all main machine learning procedures, and it also  combines learning schemes and attribute evaluators of the Weka learning  environment. It is available as a stand-alone tool for data analysis and  as a data-mining engine that can be integrated into your own products.Weka
 Written in Java, Weka  (Waikato Environment for Knowledge Analysis) is a well-known suite of  machine learning software that supports several typical data mining  tasks, particularly data preprocessing, clustering, classification,  regression, visualization, and feature selection. Its techniques are  based on the hypothesis that the data is available as a single flat file  or relation, where each data point is labeled by a fixed number of  attributes. Weka provides access to SQL databases utilizing Java  Database Connectivity and can process the result returned by a database  query. Its main user interface is the Explorer, but the same  functionality can be accessed from the command line or through the  component-based Knowledge Flow interface.
Written in Java, Weka  (Waikato Environment for Knowledge Analysis) is a well-known suite of  machine learning software that supports several typical data mining  tasks, particularly data preprocessing, clustering, classification,  regression, visualization, and feature selection. Its techniques are  based on the hypothesis that the data is available as a single flat file  or relation, where each data point is labeled by a fixed number of  attributes. Weka provides access to SQL databases utilizing Java  Database Connectivity and can process the result returned by a database  query. Its main user interface is the Explorer, but the same  functionality can be accessed from the command line or through the  component-based Knowledge Flow interface.JHepWork
 Designed for scientists, engineers and students, jHepWork  is a free and open-source data-analysis framework that is created as an  attempt to make a data-analysis environment using open-source packages  with a comprehensible user interface and to create a tool competitive to  commercial programs. It is specially made for interactive scientific  plots in 2D and 3D and contains numerical scientific libraries  implemented in Java for mathematical functions, random numbers, and  other data mining algorithms. jHepWork is based on a high-level  programming language Jython, but Java coding can also be used to call  jHepWork numerical and graphical libraries.
Designed for scientists, engineers and students, jHepWork  is a free and open-source data-analysis framework that is created as an  attempt to make a data-analysis environment using open-source packages  with a comprehensible user interface and to create a tool competitive to  commercial programs. It is specially made for interactive scientific  plots in 2D and 3D and contains numerical scientific libraries  implemented in Java for mathematical functions, random numbers, and  other data mining algorithms. jHepWork is based on a high-level  programming language Jython, but Java coding can also be used to call  jHepWork numerical and graphical libraries.KNIME
 KNIME  (Konstanz Information Miner) is a user friendly, intelligible, and  comprehensive open-source data integration, processing, analysis, and  exploration platform. It gives users the ability to visually create data  flows or pipelines, selectively execute some or all analysis steps, and  later study the results, models, and interactive views. KNIME is  written in Java, and it is based on Eclipse and makes use of its  extension method to support plugins thus providing additional  functionality. Through plugins, users can add modules for text, image,  and time series processing and the integration of various other open  source projects, such as R programming language, Weka, the Chemistry  Development Kit, and LibSVM.
KNIME  (Konstanz Information Miner) is a user friendly, intelligible, and  comprehensive open-source data integration, processing, analysis, and  exploration platform. It gives users the ability to visually create data  flows or pipelines, selectively execute some or all analysis steps, and  later study the results, models, and interactive views. KNIME is  written in Java, and it is based on Eclipse and makes use of its  extension method to support plugins thus providing additional  functionality. Through plugins, users can add modules for text, image,  and time series processing and the integration of various other open  source projects, such as R programming language, Weka, the Chemistry  Development Kit, and LibSVM.If you know of other free and open-source data mining software, please share them with us via comment.
 
No comments:
Post a Comment