What is data mining?
There are a lot of words to describe various kinds of data mining. Some people just use them interchangeably, while other people like to stand up for their own field, and like to differentiate their terms when they are not being confused with others.
Main types of data mining
- Data Mining: This is the most basic type of data mining. It is what we do when we want to find patterns in data that meet certain criteria.
- Hierarchical Metrics: This type of data mining looks at a set of metrics in order to tell us where something is clustered as opposed to where it might be spread out across the more general population.
- Regression Analysis: This type of data mining looks at a set of variables (often including measures from previous versions) in order to tell us what the relationship between two variables looks like, and how it changes over time or across several variants or subsets (for example, gender, age, etc.).
- Clustering analysis: This type of analysis compares demographic characteristics among similar groups (such as parents vs non-parents) and tries to determine why that might be the case or what ways it can be improved by changing your approach for better results.
You will see these terms used interchangeably when you start your journey into this hobby (especially online). Do not worry about how they are actually different; just learn one and stick with it!
What is Big Data?
Data mining is the task of extracting meaningful patterns from unstructured data. There are many applications of this activity, including marketing, finance, and healthcare. The term “data mining” is typically used to refer to any form of data mining that involves exploring a large amount of data and extracting patterns and insights from it.
For example, if you want to know what types of people are talking about your product on Twitter, you will need to collect a lot of tweets (possibly thousands). Your goal is to find out what types of people are talking about your product on Twitter. To do this, you’ll need access to the raw data; how can you do that?
You might think that analyzing the raw text data is the most straightforward way. However, normal text analysis tools are not very good at dealing with unstructured text (like tweets). So instead we look for patterns in the raw data that can be used for prediction:
• Who? Who is talking about what?
• What kind of people? What kind of things are they talking about?
• How often? How long has someone been tweeting about your product XZYYZZYZSXSZXSZXBSZXBSXSXSSZXB? If a person is for instance tweeting about your product often enough that he/she should be familiar with it already, then one way to find out who uses it often enough is:
• If someone has tweeted X times in recent days, tweet X times again — this will tell you if they have been tweeting regularly or just occasionally. If they have tweeted X times in recent days but not x times in recent weeks (perhaps because they were on vacation), then one way to find out how frequently they tweet would be:
• If someone has tweeted X times since last week… …then we can assume that he/she tweets frequently enough:
This kind of pattern discovery — identifying recurring themes or patterns within unstructured data — also requires understanding some basic statistical principles like correlation and regression analysis as well as understanding some machine learning concepts like decision trees and classification algorithms.
More advanced techniques like a k-nearest neighbor classifier, support vector machines, and neural networks are all useful tools for performing these tasks. But even these advanced techniques require significant amounts of training before being able to perform them efficiently (an hour per classifier). These algorithms can also use other kinds of information besides unstructured text
What is a data warehouse?
Data mining is the process of extracting useful information from data. It is a type of predictive analytics that can be used to make better decisions and predictions.
Data mining is similar to statistics, but it differs because it uses machine learning to extract data as opposed to a human expert who looks at data and makes an educated guess.
Data mining allows businesses and organizations to take advantage of powerful new methods for selecting the most appropriate information in the data they have available. This means that companies can now use machine learning algorithms, like artificial intelligence (AI) and natural language processing (NLP), instead of their own human experts, to create intelligent systems that are more accurate than their own
What are the most important things to consider when building a data mining project?
- Data mining is generally understood by most people as the process of discovering patterns in data, but these days we can use it to get insights into what is happening in our users, not just our data.
- Data mining has become very popular lately, especially due to the rise of big data and all kinds of analytics available through the Internet. The scope for data mining has also expanded beyond its traditional usage, and now includes many areas like financial and health information gathering, social media monitoring, and security analysis.
- What do you need to do when building a project or system that uses data mining? You need to consider the following points:
- The role of the data miner in your project: What types of data are you going to use? Do you have an interest in criminal justice or social science? What important questions will be answered by your project? Where do you want your project or application’s value-added? (Do you want it for education or business?) Which problems are most important to solve?
- What kind of analysis do you want to be done on your data sets: Are they trending analyses where a very high level of understanding is required (using big-data methods), or are they more complicated real-world analyses with a lot more manual work involved (using statistical techniques)?
- How much time will it take to do all this analysis: Are you only interested in finding out patterns and trends over a short period (perhaps per day), are you only interested in finding out how well-known players perform over a longer period (perhaps per week), or are there other ways that rely on different approaches?
- What type of technology will be used for this kind of analysis: We mentioned Big Data earlier; what else might be involved here besides Big Data? In particular, what about machine learning — are there any special considerations here?
- Could machine learning be used for these types of tasks as well as traditional machine learning approaches? Can we combine them together (e.g. using deep learning on large datasets)?
- How much training is required before it can be used for real-world purposes, e.g. could we just go straight from training algorithms back into our application without any further processing steps taken first? Or could we use existing machine learning models trained on datasets that were preprocessed by hand beforehand so that using those models directly would require less training than using them directly from an algorithmic standpoint?). If so, this would need careful consideration too!
How can we implement data mining projects into business processes, and what issues should be considered when doing so?
Data mining is a very useful tool for organizations, but it is a tool. It can be used to identify patterns in data, and assess the quality of data being used. The problem with data mining is that it often doesn’t tell you anything you don’t already know.
For example, the most surprising finding in my recent survey was that “the number of hours spent on Facebook doesn’t correlate with how happy users are.” This was not expected to be the case, as we had long believed that people who spend more time on Facebook are generally happier users. In fact, the opposite seems true: people who spend more time on Facebook are actually less happy than those who spend less time there.
Also, surprising was that “the number of likes won’t predict how likely someone will be to recommend your product.” One reason why this might be is that like any other social network, being liked or not will depend upon your proximity to friends (and perhaps even whether they have liked your product).
Furthermore, a recent study found that “the size of each data set — either in terms of size or density — doesn’t predict any measure of their predictive power. The only thing that matters when using digital data is whether or not you have access to it and why you want it.”
Finally, new research shows us that our perceptions about digital objects often aren’t accurate: “When we see an object as having a certain shape or color … what we perceive varies from what it really looks like! Our eyes can fool us into thinking specific things when they don’t match up with reality!
Digital objects can seem bigger than they actually are! When we see something as white, black, or gray … this isn’t always the case! Our eyes can trick us into thinking what we think we are seeing isn’t what’s really there! We all know this from looking at static images on our computer screens, but these same principles apply to digital objects too! So if you think an image looks like a mountain … then try walking up toward it and see if you still think so after all.”