Dummy classifiers using sklearn library in Python
Fellow coders, in this tutorial we will learn about the dummy classifiers using the scikit-learn library in Python. Scikit-learn is a library in Python that provides a range of supervised and unsupervised learning algorithms and also supports Python’s numerical and scientific libraries like NumPy and SciPy. The scikit-learn library’s functionality includes regression, classification, clustering, model section and preprocessing.
What are dummy classifiers in sklearn:
A DummyClassifier is a classifier in the sklearn library that makes predictions using simple rules and does not generate any valuable insights about the data. As the name suggests, dummy classifiers are used as a baseline and can be compared to real classifiers and thus we must not use it for actual problems. All the other (real) classifiers are expected to perform better on any dataset when compared to the dummy classifier. The classifier does not take into account the training data and instead uses one of the strategies to predict the class label. Stratified, most frequent, constant, and uniform are a few of the strategies used by dummy classifiers. We will implement all these strategies in our code below and check out the results.
Working with the code:
Let us implement dummy classifiers using the sklearn library:
Create a new Python file and import all the required libraries:
from sklearn.dummy import DummyClassifier import numpy as np
Now, let’s start writing our code for implementing dummy classifiers:
a = np.array([-1, 1, 1, 1]) b = np.array([0, 1, 1, 1]) strat = ["most_frequent", "stratified", "constant", "uniform"] for s in strat: if s == "constant": dummy_clf = DummyClassifier(strategy=s,random_state=None,constant=1) else: dummy_clf = DummyClassifier(strategy=s,random_state=None) dummy_clf.fit(a,b) print(s) dummy_clf.predict(a) dummy_clf.score(a,b) print("----------------------xxxxxxx----------------------")
After running the code, here is the output:
DummyClassifier(constant=None, random_state=None, strategy='most_frequent') most_frequent array([1, 1, 1, 1]) 0.75 --------------------------------xxxxxxx-------------------------------- DummyClassifier(constant=None, random_state=None, strategy='stratified') stratified array([1, 1, 0, 1]) 0.25 --------------------------------xxxxxxx-------------------------------- DummyClassifier(constant=1, random_state=None, strategy='constant') constant array([1, 1, 1, 1]) 0.75 --------------------------------xxxxxxx-------------------------------- DummyClassifier(constant=None, random_state=None, strategy='uniform') uniform array([0, 0, 1, 0]) 1.0 --------------------------------xxxxxxx-------------------------------
Also learn: Sequential forward selection with Python and Scikit learn
This post was very helpful I used this in my assignment thank you so much
Thank you for posting this tutorial. It helped me to understand this concept clearly.