changing remote

This commit is contained in:
Alejandro Moreo Fernandez 2021-02-01 11:25:13 +01:00
parent acb38d4aae
commit 908fc2d6da
2 changed files with 20 additions and 24 deletions

View File

@ -1,30 +1,26 @@
Things to clarify:
maybe I have to review the validation of the sav-loss; since it is batched, it might be always checking the same
submatrices of for alignment, and those may be mostly positive or mostly near an identity?
maybe the sav-loss is something which may have sense to impose, as a regularization, across many last layers, and not
only the last one?
about the network:
==================
remove the .to() calls inside the Module and use the self.on_cpu instead
process datasets and leave it as a generic parameter
padding could start at any random point between [0, length_i-pad_length]
- in training, pad to the shortest
- in test, pad to the largest
save and restore checkpoints
should the phi(x) be normalized? if so:
- better at the last step of phi?
- better outside phi, previous to the gram matrix computation?
should the single-label classifier have some sort of non linearity from the phi(x) to the labels?
about the loss and the KTA:
===========================
not clear whether we should define the loss as in "On kernel target alignment", i.e., a numerator with <K,Y>f (and
change sign to minimize) or as |K-Y|f norm. What about the denominator (now, the normalization factor is n**2)?
maybe the sav-loss is something which may have sense to impose, as a regularization, across many last layers, and not
only the last one?
are the contribution of the two losses comparable? or one contributes far more than the other?
is the TwoClassBatch the best way?
maybe I have to review the validation of the sav-loss; since it is batched, it might be always checking the same
submatrices of for alignment, and those may be mostly positive or mostly near an identity?
SAV: how should the range of k(xi,xj) be interpreted? how to decide for value threshold for returning -1 or +1?
I guess the best thing to do is to learn a simple threshold, one feed forward 1-to-1
is the TwoClassBatch the best way?
are the contribution of the two losses comparable? or one contributes far more than the other?
what is the best representation for inputs? char-based? ngrams-based? word-based? or a multichannel one?
I think this is irrelevant for the paper
not clear whether the single-label classifier should work out a ff on top of the intermediate representation, or should it
instead work directly on the representations with one simple linear projection; not clear either whether the kernel
should be computed on any further elaboration from the intermediate representation... thing is, that the <phi(xi),phi(xj)>
is imposing unimodality (documents from the same author should point in a single direction) while working out another
representation for the single-label classifier could instead relax this and attribute to the same author vectors that
come from a multimodal distribution. No... This "unimodality" should exist anyway in the last layer. Indeed I start
thinking that the optimum for any classifier should already impose something similar to the KTA criteria in the
last layer... Is this redundant?
not clear whether we should define the loss as in "On kernel target alignment", i.e., a numerator with <K,Y>f (and
change sign to minimize) or as |K-Y|f norm. What about the denominator (now, the normalization factor is n**2)?

View File

@ -33,6 +33,7 @@ class AuthorshipAttributionClassifier(nn.Module):
X, Xval, y, yval = train_test_split(X, y, test_size=val_prop, stratify=y)
with open(log, 'wt') as foo:
print()
foo.write('epoch\ttr-loss\tval-loss\tval-acc\tval-Mf1\tval-mf1\n')
tr_loss, val_loss = -1, -1
pbar = tqdm(range(1, batcher.n_epochs+1))
@ -160,7 +161,6 @@ def choose_sav_pairs(y, npairs):
idx2 = np.concatenate([posj, negj])
savlabels = np.array([1]*len(posi) + [0]*len(negi))
print(f'generated {len(posi)} pos and {len(negi)}')
return idx1, idx2, savlabels