Context of Data Mining in Business Information
The demand for more sophisticated and intelligent Bal solutions is constantly growing due to the fact that storage capacity grows with twice the speed of its processor power. This unbalanced growth relationship will over time make data processing tasks more time consuming when using traditional Bal solutions. Data Milling (DMS) offers a variety of advanced data processing techniques that may beneficially apply for Bal purposes. This process often requires customization of the DMS algorithm with respect to a given Bal purpose.
The comprehensive process of applying 81 for a business problem is referred to as the Knowledge Discovery in Databases (KID) process and is vital for successful DMS implementations with BI in mind. Bal can be applied In many Interesting ways with one Important thing In common which is aiding the user in the process of analyzing extensive quantities of information. However, the Bal complexity of the individual solution varies a lot and it can be used in distinguishing the solutions in terms of how automatic and intelligent they are.
To generalize, Bal solutions can be divided Into two groups of analysis types.
- Query-Reporting-Analyses – This type of analysis Is often query based and Is normally used for determining “What happened? ” in a business over a given duration. The user would know what kind of information to search for as queries are used. In addition, Bal solutions of this kind are generally operated manually and therefore time consuming. Provide answers for questions of “What happened? “, Data Mining utilizes clever algorithms for a much deeper and intelligent analysis of data.
Bal solutions using Data Mining techniques are then capable of handling ‘What will happen? ” and “How/why did this happen? ” matters. All this is done in a semi- or full-automatic process which saves both time and resources. This is exemplified by comparing two different cases of Business Intelligence which are, OLAP and Data Mining. OLAP is used manually and the user has to know what to look for I. E. , analytic queries of dimensional nature. The OLAP cubes make it easy to slice/dice the multiple data dimensions in order to investigate a certain data relation.
However, this can be a difficult and time consuming task when working with large amounts of data with high dimensionality. For example, it is similar to finding a needle in a haystack. OLAP also provides the user with a low level data analysis able to handle’s has happened? ” queries. Compared to OLAP, Data Mining operates very differently and offers a much more rueful and deep data analysis. The user does not have to locate the interesting patterns/relations manually. Instead, the Data Mining algorithms will “mine” the multidimensional data intelligently in a semi-Skull automatic process and extract the findings.
Furthermore, Data Mining can be used in a wide range of complex scenarios such as often of theta will happen? ” or “How/Why did this happen? “
Knowledge discovery in databases
Data Mining is also popularly known in the world of intelligent data processing as Knowledge Discovery in Databases (KID). Fayed defines KID as”the nontrivial recess of identifying valid, novel, potentially useful, and ultimately understandable patterns in data”. [Osama Fayed, 1997] The terms Knowledge Discovery in Databases and Data Mining are often believed to hold the same meaning.
However, this is not true as Data Mining is the name of a group of intelligent Bal methods whereas KID describes the entire process of extracting information from a data warehouse. Moreover, the Data Mining task is part of the KID process and according to Fayed [Osama Fayed, 1997] the KID process can be divided into the following steps: Payday’s Knowledge Discovery in Databases process . Selecting target data from a data warehouse – A data warehouse often contains many databases which each contains large amounts of data. In order to save resources, only relevant target data are selected from the data warehouse. Cleaning and preprocessing the target data – The raw data is normally in an handling these factors should be decided on.
- Transformation and reduction of the preprocessed data – In this step, useful features to represent the data depending on the goal in a given task should be found. In addition, dimensionality reduction/transformation can reduce the effective umber of variables in consideration.
- Applying Data Mining to the transformed data – Once the data has been transformed, a proper Data Mining technique is applied in order to intelligently process the data for patterns and other information.
- Evaluation/visualization of Data Mining results – The results of the Data Mining step are not always easy to interpret. Using visualization in the evaluation process can prove to be of great advantage. All of the steps in the KID process are essential to ensure useful models/patterns are extracted from a given data set. Simply, applying Data Mining methods to data sets regardless of the other KID steps often results in discovery of misleading models/patterns and is therefore a risky activity. As shown in Figure 1, the KID process is iteratively involving numerous steps with many decisions made by the user.
Finally, if useful knowledge is extracted in the KID process this should be implemented in the respective company’s business model in order to optimize important factors such as turnover and profit.
The data minning process
Bal Data mining activities constitute an iterative process aimed at the analysis of large databases, with the purpose of extracting information and knowledge that may prove accurate and potentially useful for knowledge workers engaged in decision making and problem solving. Carlo Vermicelli, 2009] The analysis process is iterative in nature since there are distinct phases that might require feedback and subsequent revisions. Usually such a process involves a cooperative activity between application domain experts and data analysts, who use mathematical models for inductive learning. Past experiences indicates that a data mining study requires frequent interventions by the analysts across the different investigation phases and thus, cannot easily be automated. It is deemed necessary that the knowledge extracted to be accurate, in the sense that it must be confirmed by data and not lead to misleading conclusions.
Data mining activities can be subdivided into two major investigation streams, according to the main purpose of the analysis: interpretation and prediction. Interpretation. The purpose of interpretation is to identify regular patterns in the data and to express them through rules and criteria that can be easily understood by trivial in order to actually increase the level of knowledge and understanding of the yester of interest. Prediction. The purpose of prediction is to anticipate the value that a random variable will assume in the future or to estimate the likelihood of future events.
Definition of objectives
Data mining analysis are carried out in specific application domains and are intended to provide decision makers with useful knowledge. Thus, intuition and competence are required by the domain experts in order to formulate plausible and well-defined investigation objectives. The definition of the objectives will benefit from the close cooperation between the application main experts and data mining analysts, hence, making it possible to define the problem and the goals of the investigation. . Data gathering and integration. Once the objectives of the investigation have been identified, the gathering of data begins. Data may come from various sources and therefore may require integration. Data sources may be internal, external or a combination of both. In some instances, data sources are already structured in data warehouses and data marts for OLAP analyses and more generally for decision support activities. These are favorable situations where it is sufficient to select the attributes deemed relevant for the purpose of a data mining analysis.
However, there is a risk which in order to limit memory uptake, the information stored in a data warehouse might have been aggregated and consolidated to such an extent as to render useless for any subsequent analysis. For example, if a company in the retail industry stores for each customer the total amount of every receipt, without keeping track of each individual purchased item, a future data mining analysis aimed at investigating the actual purchasing behavior may be compromised. In such tuitions, the process of data gathering and integration becomes more arduous and therefore more prone to errors. . Exploratory analysis. In the third phase of the data mining process, a preliminary analysis of the data is carried out so as to get acquainted with the available information and carry out data cleansing. Usually, the data stored in a data warehouse are processed at loading time in such a way as to remove any syntactical inconsistencies. In the data mining process, data cleansing occurs at a semantic level. First of all, the distribution of the values for each attribute is studied, using castigators for categorical attributes and basic summary statistics for numerical variables.
In this way, any abnormal values/outliers and missing values are also highlighted. These are studied by the application domain experts who may consider excluding the corresponding records from the investigation.
Attribute selection
In this phase, the relevance of the different attributes is use are removed, in order to cleanse irrelevant information from the dataset. Furthermore, new attributes obtained from the original variables through appropriate transformations are included into the dataset.
Exploratory analysis and attribute selection are critical and often challenging stages of the data mining process and may influence the level of success of the subsequent stages, to a great extent.
Model development and validation
Once a high quality dataset has been assembled and possibly enriched with newly defined attributes, pattern recognition and predictive models can be developed. Usually the training of the models is carried out using a sample of records extracted from the original dataset. More precisely, the available dataset is split into two subsets and this is used to assess the model’s accuracy and validity. . Prediction and interpretation. Upon conclusion of the data mining process, the model selected among those generated during the development phase should be implemented and used to achieve the goals that were originally identified. Also, it is incorporated into the procedures supporting decision-making processes so that knowledge workers may be able to use it to draw predictions and acquire a more in- depth knowledge of the phenomenon of interest.
Task and methods of data minning
Bal As mentioned earlier, Data Mining is the task of extracting patterns and other interesting relations from large volumes of data.
This nontrivial task is accomplished by the use of complex algorithms in a semi- and automatic manner. Before applying the DMS, it is crucial that data has been properly reduced and transformed to avoid unnecessary complications. The preparation of the data also depends on what kind of Data Mining is wanted. Several Data Mining tasks exists, thus it is vital to decide on which kind task to use when defining the goals and purposes of the Knowledge Discovery in Databases process. Some of the data mining tasks are described below:
- Classification is supposedly the most popular Data Mining tasks considering its broad application domain.Its main purpose is to classify one or more data samples that may consist of few or many features (dimensions).
- Estimation is similar to classification in algorithm-wise. However, estimation does not deal with determining a class for a particular data sample. Instead, it tries to predict a certain measure for a given data sample.
- Segmentation basically deals with the task of grouping a given dataset into a few main groups/clusters. Use of segmentation is beneficial in describing a large ultrasonically dataset. Moreover, many algorithm types can be used in segmentation systems. Tat. It is a popular task often performed using simple statistical methods. However, forecasting done in the Data Mining domain uses advanced (learning) methods (e. G. Neural Networks, Hidden Markova Models) that in many cases are more accurate and informative than the standard statistical methods (e. G. Moving averages).
- Association deals with task of locating events that are frequently occurring together and benefiting from this knowledge. One of the most popular examples of association is probably the Amazon. Coma’s web shop that is able to recommend related products to customers. . Text Analysis has several purposes and is often used for finding key terms and phrases in text bits. In this way, text analysis can convert unstructured text into useful structured data that can be further processed by other Data Mining tasks (e. G. Classification, segmentation, association).
Applications of data minning
FOR Bal Data mining methodologies can be applied to a variety of domains, from marketing and manufacturing process control to the study of risk factors in medical diagnosis, room the evaluation of the effectiveness of new drugs to fraud detection.
Fraud detection. It is a relevant field of application of data mining. Fraud may affect different industries such as telephony, insurance (false claims) and banking (illegal use of credit cards and bank checks; illegal monetary transactions). Risk evaluation. The purpose of risk analysis is to estimate the risk connected with future decisions. E. G., using the past observations available, a bank may develop a predictive model to establish if it is appropriate to grant a monetary loan or a home moan, based on the characteristics of the applicant. Text mining.
Data mining can be applied to different kinds of texts, which represent unstructured data, in order to classify articles, books, documents, emails and web pages. Examples are web search engines or the automatic classification of press releases for storing purposes. Other text mining applications include the generation of filters for email messages and newsgroups. Image recognition. The treatment and classification of digital images, both static and dynamic, is an exciting topic for its theoretical interest and the great number of applications it offers.
It is useful in recognizing written characters, compare and identify human faces, apply correction filters to photographic equipment and detect suspicious behaviors through surveillance video cameras. Web mining. Web mining applications, which intended for the analysis of the sequences of pages visited and the choices made by a web surfer. This may prove useful for the analysis of e-commerce sites, in offering flexible and customized pages e-learning training course.
Conclusion
In conclusion, with the rapid proliferation of the Internet and related technologies, his has created an unprecedented opportunity for enterprises to collect massive amounts of data regarding customers and all aspects of their business operations. Yet the reality is that most organizations today are “data rich” but “information and knowledge poor”, and by not harnessing the full potential of their data, which is perhaps the second most important asset after human capital, they incur a lot of unforeseen losses.
Internet based applications such as social media, website usage tracking and online reviews as well as more traditional technology applications like RIFF, Supply Chain Management (SCM), Enterprise Resource Planning (ERP) and Customer Relationship Management (CRM) provide access to vast amounts of data regarding customers, suppliers, competitors as well as a firm’s own activities and business processes. By unlocking the insights and knowledge trapped in such raw data constitutes a key lever for competitive advantage in hypersensitive business environments.