CHAPTER 13. LINEAR FACTOR MODELS
Many variants of ICA are not generative models in the sense that we use the
phrase. In this book, a generative model either represents
p
(
x
) or can draw samples
from it. Many variants of ICA only know how to transform between
x
and
h
, but
do not have any way of representing
p
(
h
), and thus do not impose a distribution
over
p
(
x
). For example, many ICA variants aim to increase the sample kurtosis of
h
=
W
−1
x
, because high kurtosis indicates that
p
(
h
) is non-Gaussian, but this is
accomplished without explicitly representing
p
(
h
). This is because ICA is more
often used as an analysis tool for separating signals, rather than for generating
data or estimating its density.
Just as PCA can be generalized to the nonlinear autoencoders described in
chapter 14, ICA can be generalized to a nonlinear generative model, in which
we use a nonlinear function
f
to generate the observed data. See Hyvärinen and
Pajunen (1999) for the initial work on nonlinear ICA and its successful use with
ensemble learning by Roberts and Everson (2001) and Lappalainen et al. (2000).
Another nonlinear extension of ICA is the approach of
nonlinear independent
components estimation
, or NICE (Dinh et al., 2014), which stacks a series
of invertible transformations (encoder stages) that have the property that the
determinant of the Jacobian of each transformation can be computed eﬃciently.
This makes it possible to compute the likelihood exactly and, like ICA, attempts
to transform the data into a space where it has a factorized marginal distribution,
but is more likely to succeed thanks to the nonlinear encoder. Because the encoder
is associated with a decoder that is its perfect inverse, it is straightforward to
generate samples from the model (by ﬁrst sampling from
p
(
h
) and then applying
the decoder).
Another generalization of ICA is to learn groups of features, with statistical
dependence allowed within a group but discouraged between groups (Hyvärinen and
Hoyer, 1999; Hyvärinen et al., 2001b). When the groups of related units are chosen
to be non-overlapping, this is called
independent subspace analysis
. It is also
possible to assign spatial coordinates to each hidden unit and form overlapping
groups of spatially neighboring units. This encourages nearby units to learn similar
features. When applied to natural images, this
topographic ICA
approach learns
Gabor ﬁlters, such that neighboring features have similar orientation, location or
frequency. Many diﬀerent phase oﬀsets of similar Gabor functions occur within
each region, so that pooling over small regions yields translation invariance.
13.3 Slow Feature Analysis
Slow feature analysis
(SFA) is a linear factor model that uses information from
493