We will use Hugging Face Dataset to demonstrate Text Classification. You can checkout this course for more details on how to use Hugging Face.
For this course, we are using “emotion” dataset from Hugging Face
from datasets import load_dataset
emotion = load_dataset("emotion")The emotions object is an example of the DatasetDict class, functioning akin to a Python dictionary, and it features split keys.
To extract different split of data, we can use following code
train_ds = emotions["train"]
validation_ds = emotions["validation"]
test_ds = emotions["test"]Each object in the above split is instance of class Dataset
To find the column Name in the above dataset, we can use following code
train_ds.column_namesAlthough Dataset provide a lot of APIs for data manipulation, but due to familiarity with Pandas it is a standard way to convert Dataset into Dataframe
import pandas as pd
emotions.set_format(type="pandas")
df = emotions["train"][:]