quapy.classification package¶
Submodules¶
quapy.classification.methods module¶
- class quapy.classification.methods.LowRankLogisticRegression(n_components=100, **kwargs)¶
Bases:
sklearn.base.BaseEstimator
An example of a classification method (i.e., an object that implements fit, predict, and predict_proba) that also generates embedded inputs (i.e., that implements transform), as those required for
quapy.method.neural.QuaNet
. This is a mock method to allow for easily instantiatingquapy.method.neural.QuaNet
on array-like real-valued instances. The transformation consists of applyingsklearn.decomposition.TruncatedSVD
while classification is performed usingsklearn.linear_model.LogisticRegression
on the low-rank space.- Parameters
n_components – the number of principal components to retain
kwargs – parameters for the Logistic Regression classifier
- fit(X, y)¶
Fit the model according to the given training data. The fit consists of fitting TruncatedSVD and then LogisticRegression on the low-rank representation.
- Parameters
X – array-like of shape (n_samples, n_features) with the instances
y – array-like of shape (n_samples, n_classes) with the class labels
- Returns
self
- get_params()¶
Get hyper-parameters for this estimator.
- Returns
a dictionary with parameter names mapped to their values
- predict(X)¶
Predicts labels for the instances X embedded into the low-rank space.
- Parameters
X – array-like of shape (n_samples, n_features) instances to classify
- Returns
a numpy array of length n containing the label predictions, where n is the number of instances in X
- predict_proba(X)¶
Predicts posterior probabilities for the instances X embedded into the low-rank space.
- Parameters
X – array-like of shape (n_samples, n_features) instances to classify
- Returns
array-like of shape (n_samples, n_classes) with the posterior probabilities
- set_params(**params)¶
Set the parameters of this estimator.
- Parameters
parameters – a **kwargs dictionary with the estimator parameters for Logistic Regression and eventually also n_components for TruncatedSVD
- transform(X)¶
Returns the low-rank approximation of X with n_components dimensions, or X unaltered if n_components >= X.shape[1].
- Parameters
X – array-like of shape (n_samples, n_features) instances to embed
- Returns
array-like of shape (n_samples, n_components) with the embedded instances
quapy.classification.neural module¶
- class quapy.classification.neural.CNNnet(vocabulary_size, n_classes, embedding_size=100, hidden_size=256, repr_size=100, kernel_heights=[3, 5, 7], stride=1, padding=0, drop_p=0.5)¶
Bases:
quapy.classification.neural.TextClassifierNet
An implementation of
quapy.classification.neural.TextClassifierNet
based on Convolutional Neural Networks.- Parameters
vocabulary_size – the size of the vocabulary
n_classes – number of target classes
embedding_size – the dimensionality of the word embeddings space (default 100)
hidden_size – the dimensionality of the hidden space (default 256)
repr_size – the dimensionality of the document embeddings space (default 100)
kernel_heights – list of kernel lengths (default [3,5,7]), i.e., the number of consecutive tokens that each kernel covers
stride – convolutional stride (default 1)
stride – convolutional pad (default 0)
drop_p – drop probability for dropout (default 0.5)
- document_embedding(input)¶
Embeds documents (i.e., performs the forward pass up to the next-to-last layer).
- Parameters
input – a batch of instances, typically generated by a torch’s DataLoader instance (see
quapy.classification.neural.TorchDataset
)- Returns
a torch tensor of shape (n_samples, n_dimensions), where n_samples is the number of documents, and n_dimensions is the dimensionality of the embedding
- get_params()¶
Get hyper-parameters for this estimator
- Returns
a dictionary with parameter names mapped to their values
- property vocabulary_size¶
Return the size of the vocabulary
- Returns
integer
- class quapy.classification.neural.LSTMnet(vocabulary_size, n_classes, embedding_size=100, hidden_size=256, repr_size=100, lstm_class_nlayers=1, drop_p=0.5)¶
Bases:
quapy.classification.neural.TextClassifierNet
An implementation of
quapy.classification.neural.TextClassifierNet
based on Long Short Term Memory networks.- Parameters
vocabulary_size – the size of the vocabulary
n_classes – number of target classes
embedding_size – the dimensionality of the word embeddings space (default 100)
hidden_size – the dimensionality of the hidden space (default 256)
repr_size – the dimensionality of the document embeddings space (default 100)
lstm_class_nlayers – number of LSTM layers (default 1)
drop_p – drop probability for dropout (default 0.5)
- document_embedding(x)¶
Embeds documents (i.e., performs the forward pass up to the next-to-last layer).
- Parameters
x – a batch of instances, typically generated by a torch’s DataLoader instance (see
quapy.classification.neural.TorchDataset
)- Returns
a torch tensor of shape (n_samples, n_dimensions), where n_samples is the number of documents, and n_dimensions is the dimensionality of the embedding
- get_params()¶
Get hyper-parameters for this estimator
- Returns
a dictionary with parameter names mapped to their values
- property vocabulary_size¶
Return the size of the vocabulary
- Returns
integer
- class quapy.classification.neural.NeuralClassifierTrainer(net: quapy.classification.neural.TextClassifierNet, lr=0.001, weight_decay=0, patience=10, epochs=200, batch_size=64, batch_size_test=512, padding_length=300, device='cpu', checkpointpath='../checkpoint/classifier_net.dat')¶
Bases:
object
Trains a neural network for text classification.
- Parameters
net – an instance of TextClassifierNet implementing the forward pass
lr – learning rate (default 1e-3)
weight_decay – weight decay (default 0)
patience – number of epochs that do not show any improvement in validation to wait before applying early stop (default 10)
epochs – maximum number of training epochs (default 200)
batch_size – batch size for training (default 64)
batch_size_test – batch size for test (default 512)
padding_length – maximum number of tokens to consider in a document (default 300)
device – specify ‘cpu’ (default) or ‘cuda’ for enabling gpu
checkpointpath – where to store the parameters of the best model found so far according to the evaluation in the held-out validation split (default ‘../checkpoint/classifier_net.dat’)
- property device¶
Gets the device in which the network is allocated
- Returns
device
- fit(instances, labels, val_split=0.3)¶
Fits the model according to the given training data.
- Parameters
instances – list of lists of indexed tokens
labels – array-like of shape (n_samples, n_classes) with the class labels
val_split – proportion of training documents to be taken as the validation set (default 0.3)
- Returns
- get_params()¶
Get hyper-parameters for this estimator
- Returns
a dictionary with parameter names mapped to their values
- predict(instances)¶
Predicts labels for the instances
- Parameters
instances – list of lists of indexed tokens
- Returns
a numpy array of length n containing the label predictions, where n is the number of instances in X
- predict_proba(instances)¶
Predicts posterior probabilities for the instances
- Parameters
X – array-like of shape (n_samples, n_features) instances to classify
- Returns
array-like of shape (n_samples, n_classes) with the posterior probabilities
- reset_net_params(vocab_size, n_classes)¶
Reinitialize the network parameters
- Parameters
vocab_size – the size of the vocabulary
n_classes – the number of target classes
- set_params(**params)¶
Set the parameters of this trainer and the learner it is training. In this current version, parameter names for the trainer and learner should be disjoint.
- Parameters
params – a **kwargs dictionary with the parameters
- transform(instances)¶
Returns the embeddings of the instances
- Parameters
instances – list of lists of indexed tokens
- Returns
array-like of shape (n_samples, embed_size) with the embedded instances, where embed_size is defined by the classification network
- class quapy.classification.neural.TextClassifierNet¶
Bases:
torch.nn.modules.module.Module
Abstract Text classifier (torch.nn.Module)
- dimensions()¶
Gets the number of dimensions of the embedding space
- Returns
integer
- abstract document_embedding(x)¶
Embeds documents (i.e., performs the forward pass up to the next-to-last layer).
- Parameters
x – a batch of instances, typically generated by a torch’s DataLoader instance (see
quapy.classification.neural.TorchDataset
)- Returns
a torch tensor of shape (n_samples, n_dimensions), where n_samples is the number of documents, and n_dimensions is the dimensionality of the embedding
- forward(x)¶
Performs the forward pass.
- Parameters
x – a batch of instances, typically generated by a torch’s DataLoader instance (see
quapy.classification.neural.TorchDataset
)- Returns
a tensor of shape (n_instances, n_classes) with the decision scores for each of the instances and classes
- abstract get_params()¶
Get hyper-parameters for this estimator
- Returns
a dictionary with parameter names mapped to their values
- predict_proba(x)¶
Predicts posterior probabilities for the instances in x
- Parameters
x – a torch tensor of indexed tokens with shape (n_instances, pad_length) where n_instances is the number of instances in the batch, and pad_length is length of the pad in the batch
- Returns
array-like of shape (n_samples, n_classes) with the posterior probabilities
- property vocabulary_size¶
Return the size of the vocabulary
- Returns
integer
- xavier_uniform()¶
Performs Xavier initialization of the network parameters
- class quapy.classification.neural.TorchDataset(instances, labels=None)¶
Bases:
torch.utils.data.dataset.Dataset
Transforms labelled instances into a Torch’s
torch.utils.data.DataLoader
object- Parameters
instances – list of lists of indexed tokens
labels – array-like of shape (n_samples, n_classes) with the class labels
- asDataloader(batch_size, shuffle, pad_length, device)¶
Converts the labelled collection into a Torch DataLoader with dynamic padding for the batch
- Parameters
batch_size – batch size
shuffle – whether or not to shuffle instances
pad_length – the maximum length for the list of tokens (dynamic padding is applied, meaning that if the longest document in the batch is shorter than pad_length, then the batch is padded up to its length, and not to pad_length.
device – whether to allocate tensors in cpu or in cuda
- Returns
a
torch.utils.data.DataLoader
object
quapy.classification.svmperf module¶
- class quapy.classification.svmperf.SVMperf(svmperf_base, C=0.01, verbose=False, loss='01')¶
Bases:
sklearn.base.BaseEstimator
,sklearn.base.ClassifierMixin
A wrapper for the SVM-perf package by Thorsten Joachims. When using losses for quantification, the source code has to be patched. See the installation documentation for further details.
References:
- Parameters
svmperf_base – path to directory containing the binary files svm_perf_learn and svm_perf_classify
C – trade-off between training error and margin (default 0.01)
verbose – set to True to print svm-perf std outputs
loss – the loss to optimize for. Available losses are “01”, “f1”, “kld”, “nkld”, “q”, “qacc”, “qf1”, “qgm”, “mae”, “mrae”.
- decision_function(X, y=None)¶
Evaluate the decision function for the samples in X.
- Parameters
X – array-like of shape (n_samples, n_features) containing the instances to classify
y – unused
- Returns
array-like of shape (n_samples,) containing the decision scores of the instances
- fit(X, y)¶
Trains the SVM for the multivariate performance loss
- Parameters
X – training instances
y – a binary vector of labels
- Returns
self
- predict(X)¶
Predicts labels for the instances X :param X: array-like of shape (n_samples, n_features) instances to classify :return: a numpy array of length n containing the label predictions, where n is the number of
instances in X
- set_params(**parameters)¶
Set the hyper-parameters for svm-perf. Currently, only the C parameter is supported
- Parameters
parameters – a **kwargs dictionary {‘C’: <float>}
- valid_losses = {'01': 0, 'f1': 1, 'kld': 12, 'mae': 26, 'mrae': 27, 'nkld': 13, 'q': 22, 'qacc': 23, 'qf1': 24, 'qgm': 25}¶