http://www.sloh.org/tsa-athletics-shorts/
Data Mining: An Overview
What is data mining?
"The mining data involves the use of sophisticated data analysis tools to discover previously unknown, valid patterns and relationships in large data sets. These tools can include statistical models, mathematical algorithms and machine learning methods (algorithms that improve their performance automatically through experience, such as neural networks or decision trees). Consequently, data mining is more than collecting and data management, but also includes analysis and prediction. "
"Data mining can be performed on the data represented in quantitative terms, text, multimedia or shapes. Data mining applications can use a variety of parameters to examine the data. They include the association (Patterns where one event is connected to another event, such as purchasing a pen and purchasing paper), sequence or path analysis (the patterns in one event leads to another event, such as the birth of a child and purchasing diapers), classification (identification of new models such as the similarities between the purchases of duct tape and plastic sheeting purchases), clustering (finding and visually documenting groups of previously unknown facts, such as geographic location and brand preferences), and forecasting (discovering patterns that one can make reasonable predictions regarding future activities, such as predicting that individuals who join an athletic club may take exercise classes). "
Reflecting this conceptualization of data mining, some observers consider data mining to be just one step in a process largest known as knowledge discovery in databases (KDD). Further steps in the KDD process, in progressive order, include data cleansing, data integration, data selection, data transformation, pattern assessment, and knowledge presentation.
A series of advances in technology and business processes have contributed to a growing interest in data mining, both in the public sector private. Some of these changes include the growth of computer networks, which can be used to connect databases, development of techniques for improvement related search, such as neural networks and advanced algorithms, the spread of the client / server computing, allowing users to access centralized data resources from the desktop, and a greater ability to combine data from different sources into a single source search.
Data mining has become increasingly common in both public and private sectors. Organizations use data mining as a tool to examine the customer information, reduce fraud and waste, and assist in medical research. However, the proliferation of data mining has raised some issues of implementation and monitoring as well. These include concerns about the quality of the data analyzed, the interoperability of databases and software between agencies, and potential infringements on privacy.
Limitations of Data Mining
"While the data mining products can be very powerful tools, they are not alone applications. To be successful, data mining requires skilled technical specialists analytical and can structure the analysis and interpretation of the output is created. Consequently, the limitations of data mining are primarily data or related personnel, rather than "related to technology.
"Although data mining can help reveal patterns and relationships, do not tell the user the value or importance of these patterns. Such determinations must be made by the user. Similarly, the validity of discovered patterns depends on how compared with the "real world" circumstances. For example, to assess the validity of a data mining application designed to identify possible suspects of terrorism in a large number of people, the user can test the model using data that includes information about known terrorists. However, while reaffirming possibly a certain profile, does not necessarily mean that the application to identify a suspect whose behavior significantly deviates from the original model. "
"Another limitation of data mining is that while it can identify connections between behaviors and / or variables, not necessarily identified a causal relationship. For example, an application that can identify a pattern of behavior, such as the propensity to buy airline tickets shortly before departure is scheduled to depart, is related to characteristics such as income, education level and Internet use. However, this does not necessarily indicate that the ticket purchasing behavior is caused by one or more of these variables. In fact, the behavior of individual might be affected by any additional variable (s), such as occupation (The need to travel the short term), family status (a sick relative needing care), or a hobby (taking advantage of last minute discounts to visit new destinations).
Data Mining Applications
"Data mining is used for a variety of purposes, both in terms public and private. Industries such as banking, insurance, medicine, retail and frequently used data mining to reduce costs, enhance research, and increase sales. For example, insurance and banking using data mining applications to detect fraud and assist in the evaluation of risks (for example, the score credit.) From customer data collected over several years, companies can develop models that predict whether a customer is a good credit risk, or if a accident claim may be fraudulent and should be investigated more closely. The medical community sometimes uses data mining to help predict the efficacy of a procedure or medicine. Pharmaceutical companies use data mining of chemical compounds and genetic material to help guide research on new treatments for diseases. Retailers can use information collected through affinity programs (eg, cards buyers' club, frequent flyer points, contests) to assess the effectiveness of product selection and placement decisions, coupon offers, and what products are often purchased together. Companies such as telephone service providers and music clubs can use data mining to create a churn analysis "," to assess which customers are likely to remain as subscribers and which are likely to switch to a competitor. "
"In the public sector, data mining applications initially were used as a means to detect fraud and waste, but also grown to be used for purposes such as measuring and improving program performance. It has been reported that data mining has helped the federal government to recover million in fraudulent Medicare payments. The Justice Department has been able to use data mining to assess crime patterns and adjust allocations resources accordingly. Similarly, the Department of Veterans Affairs has used data mining to help predict demographic changes in the electoral district that can best be used to estimate its budgetary needs. Another example is the Federal Aviation Administration, which uses data mining to review the data crash recognize common defects and recommend precautionary measures. "
Recently, data mining has been increasingly cited as an important tool for homeland security efforts. Some observers suggest that data mining should be used as a means to identify terrorist activities, such as money transfers and communications, and to identify and track individual terrorists themselves, as through travel and immigration records. Two initiatives that have attracted considerable attention include the now-discontinued Terrorism Information Awareness (TIA) conducted by the Defense Advanced Research Projects Agency (DARPA) and the now-canceled Computer-Assisted Passenger Prescreening System II (CAPPS II) that was developed by the Administration Transportation Security Administration (TSA). CAPPS II is being replaced by a new program called Secure Flight.
About the Author
Vineet Pandit
M.Tech (Software Systems)