quapy.classification package¶

Submodules¶

quapy.classification.methods module¶

class quapy.classification.methods.LowRankLogisticRegression(n_components=100, **kwargs)¶

Bases: sklearn.base.BaseEstimator

An example of a classification method (i.e., an object that implements fit, predict, and predict_proba) that also generates embedded inputs (i.e., that implements transform), as those required for quapy.method.neural.QuaNet. This is a mock method to allow for easily instantiating quapy.method.neural.QuaNet on array-like real-valued instances. The transformation consists of applying sklearn.decomposition.TruncatedSVD while classification is performed using sklearn.linear_model.LogisticRegression on the low-rank space.

Parameters

n_components – the number of principal components to retain
kwargs – parameters for the Logistic Regression classifier

fit(X, y)¶

Fit the model according to the given training data. The fit consists of fitting TruncatedSVD and then LogisticRegression on the low-rank representation.

Parameters

X – array-like of shape (n_samples, n_features) with the instances
y – array-like of shape (n_samples, n_classes) with the class labels

Returns

self

get_params()¶

Get hyper-parameters for this estimator.

Returns: a dictionary with parameter names mapped to their values

predict(X)¶

Predicts labels for the instances X embedded into the low-rank space.

Parameters: X – array-like of shape (n_samples, n_features) instances to classify
Returns: a numpy array of length n containing the label predictions, where n is the number of instances in X

predict_proba(X)¶

Predicts posterior probabilities for the instances X embedded into the low-rank space.

Parameters: X – array-like of shape (n_samples, n_features) instances to classify
Returns: array-like of shape (n_samples, n_classes) with the posterior probabilities

set_params(**params)¶

Set the parameters of this estimator.

Parameters: parameters – a **kwargs dictionary with the estimator parameters for Logistic Regression and eventually also n_components for TruncatedSVD

transform(X)¶

Returns the low-rank approximation of X with n_components dimensions, or X unaltered if n_components >= X.shape[1].

Parameters: X – array-like of shape (n_samples, n_features) instances to embed
Returns: array-like of shape (n_samples, n_components) with the embedded instances

quapy.classification.neural module¶

class quapy.classification.neural.CNNnet(vocabulary_size, n_classes, embedding_size=100, hidden_size=256, repr_size=100, kernel_heights=[3, 5, 7], stride=1, padding=0, drop_p=0.5)¶

Bases: quapy.classification.neural.TextClassifierNet

An implementation of quapy.classification.neural.TextClassifierNet based on Convolutional Neural Networks.

Parameters

vocabulary_size – the size of the vocabulary
n_classes – number of target classes
embedding_size – the dimensionality of the word embeddings space (default 100)
hidden_size – the dimensionality of the hidden space (default 256)
repr_size – the dimensionality of the document embeddings space (default 100)
kernel_heights – list of kernel lengths (default [3,5,7]), i.e., the number of consecutive tokens that each kernel covers
stride – convolutional stride (default 1)
stride – convolutional pad (default 0)
drop_p – drop probability for dropout (default 0.5)

document_embedding(input)¶

Embeds documents (i.e., performs the forward pass up to the next-to-last layer).

Parameters: input – a batch of instances, typically generated by a torch’s DataLoader instance (see quapy.classification.neural.TorchDataset)
Returns: a torch tensor of shape (n_samples, n_dimensions), where n_samples is the number of documents, and n_dimensions is the dimensionality of the embedding

get_params()¶

Get hyper-parameters for this estimator

Returns: a dictionary with parameter names mapped to their values

property vocabulary_size¶

Return the size of the vocabulary

Returns: integer

class quapy.classification.neural.LSTMnet(vocabulary_size, n_classes, embedding_size=100, hidden_size=256, repr_size=100, lstm_class_nlayers=1, drop_p=0.5)¶

Bases: quapy.classification.neural.TextClassifierNet

An implementation of quapy.classification.neural.TextClassifierNet based on Long Short Term Memory networks.

Parameters

vocabulary_size – the size of the vocabulary
n_classes – number of target classes
embedding_size – the dimensionality of the word embeddings space (default 100)
hidden_size – the dimensionality of the hidden space (default 256)
repr_size – the dimensionality of the document embeddings space (default 100)
lstm_class_nlayers – number of LSTM layers (default 1)
drop_p – drop probability for dropout (default 0.5)

document_embedding(x)¶

Embeds documents (i.e., performs the forward pass up to the next-to-last layer).

Parameters: x – a batch of instances, typically generated by a torch’s DataLoader instance (see quapy.classification.neural.TorchDataset)
Returns: a torch tensor of shape (n_samples, n_dimensions), where n_samples is the number of documents, and n_dimensions is the dimensionality of the embedding

get_params()¶

Get hyper-parameters for this estimator

Returns: a dictionary with parameter names mapped to their values

property vocabulary_size¶

Return the size of the vocabulary

Returns: integer

class quapy.classification.neural.NeuralClassifierTrainer(net: quapy.classification.neural.TextClassifierNet, lr=0.001, weight_decay=0, patience=10, epochs=200, batch_size=64, batch_size_test=512, padding_length=300, device='cpu', checkpointpath='../checkpoint/classifier_net.dat')¶

Bases: object

Trains a neural network for text classification.

Parameters

net – an instance of TextClassifierNet implementing the forward pass
lr – learning rate (default 1e-3)
weight_decay – weight decay (default 0)
patience – number of epochs that do not show any improvement in validation to wait before applying early stop (default 10)
epochs – maximum number of training epochs (default 200)
batch_size – batch size for training (default 64)
batch_size_test – batch size for test (default 512)
padding_length – maximum number of tokens to consider in a document (default 300)
device – specify ‘cpu’ (default) or ‘cuda’ for enabling gpu
checkpointpath – where to store the parameters of the best model found so far according to the evaluation in the held-out validation split (default ‘../checkpoint/classifier_net.dat’)

property device¶

Gets the device in which the network is allocated

Returns: device

fit(instances, labels, val_split=0.3)¶

Fits the model according to the given training data.

Parameters

instances – list of lists of indexed tokens
labels – array-like of shape (n_samples, n_classes) with the class labels
val_split – proportion of training documents to be taken as the validation set (default 0.3)

Returns

get_params()¶

Get hyper-parameters for this estimator

Returns: a dictionary with parameter names mapped to their values

predict(instances)¶

Predicts labels for the instances

Parameters: instances – list of lists of indexed tokens
Returns: a numpy array of length n containing the label predictions, where n is the number of instances in X

predict_proba(instances)¶

Predicts posterior probabilities for the instances

Parameters: X – array-like of shape (n_samples, n_features) instances to classify
Returns: array-like of shape (n_samples, n_classes) with the posterior probabilities

reset_net_params(vocab_size, n_classes)¶

Reinitialize the network parameters

Parameters

vocab_size – the size of the vocabulary
n_classes – the number of target classes

set_params(**params)¶

Set the parameters of this trainer and the learner it is training. In this current version, parameter names for the trainer and learner should be disjoint.

Parameters: params – a **kwargs dictionary with the parameters

transform(instances)¶

Returns the embeddings of the instances

Parameters: instances – list of lists of indexed tokens
Returns: array-like of shape (n_samples, embed_size) with the embedded instances, where embed_size is defined by the classification network

class quapy.classification.neural.TextClassifierNet¶

Bases: torch.nn.modules.module.Module

Abstract Text classifier (torch.nn.Module)

dimensions()¶

Gets the number of dimensions of the embedding space

Returns: integer

abstract document_embedding(x)¶

Embeds documents (i.e., performs the forward pass up to the next-to-last layer).

Parameters: x – a batch of instances, typically generated by a torch’s DataLoader instance (see quapy.classification.neural.TorchDataset)
Returns: a torch tensor of shape (n_samples, n_dimensions), where n_samples is the number of documents, and n_dimensions is the dimensionality of the embedding

forward(x)¶

Performs the forward pass.

Parameters: x – a batch of instances, typically generated by a torch’s DataLoader instance (see quapy.classification.neural.TorchDataset)
Returns: a tensor of shape (n_instances, n_classes) with the decision scores for each of the instances and classes

abstract get_params()¶

Get hyper-parameters for this estimator

Returns: a dictionary with parameter names mapped to their values

predict_proba(x)¶

Predicts posterior probabilities for the instances in x

Parameters: x – a torch tensor of indexed tokens with shape (n_instances, pad_length) where n_instances is the number of instances in the batch, and pad_length is length of the pad in the batch
Returns: array-like of shape (n_samples, n_classes) with the posterior probabilities

property vocabulary_size¶

Return the size of the vocabulary

Returns: integer

xavier_uniform()¶: Performs Xavier initialization of the network parameters

class quapy.classification.neural.TorchDataset(instances, labels=None)¶

Bases: torch.utils.data.dataset.Dataset

Transforms labelled instances into a Torch’s torch.utils.data.DataLoader object

Parameters

instances – list of lists of indexed tokens
labels – array-like of shape (n_samples, n_classes) with the class labels

asDataloader(batch_size, shuffle, pad_length, device)¶

Converts the labelled collection into a Torch DataLoader with dynamic padding for the batch

Parameters

batch_size – batch size
shuffle – whether or not to shuffle instances
pad_length – the maximum length for the list of tokens (dynamic padding is applied, meaning that if the longest document in the batch is shorter than pad_length, then the batch is padded up to its length, and not to pad_length.
device – whether to allocate tensors in cpu or in cuda

Returns

a torch.utils.data.DataLoader object

quapy.classification.svmperf module¶

class quapy.classification.svmperf.SVMperf(svmperf_base, C=0.01, verbose=False, loss='01')¶

Bases: sklearn.base.BaseEstimator, sklearn.base.ClassifierMixin

A wrapper for the SVM-perf package by Thorsten Joachims. When using losses for quantification, the source code has to be patched. See the installation documentation for further details.

References:

Esuli et al.2015

Barranquero et al.2015

Parameters

svmperf_base – path to directory containing the binary files svm_perf_learn and svm_perf_classify
C – trade-off between training error and margin (default 0.01)
verbose – set to True to print svm-perf std outputs
loss – the loss to optimize for. Available losses are “01”, “f1”, “kld”, “nkld”, “q”, “qacc”, “qf1”, “qgm”, “mae”, “mrae”.

decision_function(X, y=None)¶

Evaluate the decision function for the samples in X.

Parameters

X – array-like of shape (n_samples, n_features) containing the instances to classify
y – unused

Returns

array-like of shape (n_samples,) containing the decision scores of the instances

fit(X, y)¶

Trains the SVM for the multivariate performance loss

Parameters

X – training instances
y – a binary vector of labels

Returns

self

predict(X)¶: Predicts labels for the instances X :param X: array-like of shape (n_samples, n_features) instances to classify :return: a numpy array of length n containing the label predictions, where n is the number of

instances in X

set_params(**parameters)¶

Set the hyper-parameters for svm-perf. Currently, only the C parameter is supported

Parameters: parameters – a **kwargs dictionary {‘C’: <float>}

valid_losses = {'01': 0, 'f1': 1, 'kld': 12, 'mae': 26, 'mrae': 27, 'nkld': 13, 'q': 22, 'qacc': 23, 'qf1': 24, 'qgm': 25}¶

quapy.classification package¶

Submodules¶

quapy.classification.methods module¶

quapy.classification.neural module¶

quapy.classification.svmperf module¶

Module contents¶

Table of Contents

Previous topic

Next topic

This Page