Non-Local Manifold Tangent Learning
View Online Non-Local Manifold Tangent Learning PDF
Source: books.nips.cc
File size: 117.4 KB
Pages: 8 page(s)
View Online Download
Content Inside

A central issue of generalization is how information from the training examples can be used to make predictions about new examples, and without strong prior assumptions, i.e. in non-parametric models, this may be fundamentally dif?cult as illustrated by the curse of dimensionality. There has been in recent years a lot of work on unsupervised learn- ing based on characterizing a possibly non-linear manifold near which the data would lie, such as Locally Linear Embedding (LLE) (Roweis and Saul, 2000), Isomap (Tenenbaum, ¨ de Silva and Langford, 2000), kernel Principal Components Analysis (PCA) (Sch olkopf, Smola and M¨ ller, 1998), Laplacian Eigenmaps (Belkin and Niyogi, 2003), and Mani- u fold Charting (Brand, 2003). These are all essentially non-parametric methods that can be shown to be kernel methods with an adaptive kernel (Bengio et al., 2004), and which rep- resent the manifold on the basis of local neighborhood relations, very often constructed us- ing the nearest neighbors graph (the graph with one vertex per observed example, and arcs between near neighbors). The above methods characterize the manifold through an em- bedding which associates each training example (an input object) with a low-dimensional coordinate vector (the coordinates on the manifold). Other closely related methods char- acterize the manifold as well as “noise” around it. Most of these methods consider the density as a mixture of ?attened Gaussians, e.g. mixtures of factor analyzers (Ghahramani and Hinton, 1996), Manifold Parzen windows (Vincent and Bengio, 2003), and other local PCA models such as mixtures of probabilistic PCA (Tipping and Bishop, 1999). This is not an exhaustive list, and recent work also combines modeling through a mixture density and dimensionality reduction (Teh and Roweis, 2003; Brand, 2003). By “local manifold learning”, we mean a method that derives information about the local structure of the manifold (i.e. implicitly its tangent directions) at x based mostly on the training examples “around” x. There is a large group of manifold learning methods (as well as the spectral clustering methods) that share several characteristics, and can be seen as data-dependent kernel PCA (Bengio et al., 2004). These include LLE (Roweis and Saul, ¨ 2000), Isomap (Tenenbaum, de Silva and Langford, 2000), kernel PCA (Sch olkopf, Smola and M¨ ller, 1998) and Laplacian Eigenmaps (Belkin and Niyogi, 2003). They ?rst build a u data-dependent Gram matrix M with n × n entries KD (xi , xj ) where D = {x1 , . . . , xn } is the training set and KD is a data-dependent kernel, and compute the eigenvector- eigenvalue pairs {(vk , ?k )} of M . The embedding of the training set is obtained directly from the principal eigenvectors vk of M (the i-th element of vk gives the k-th coordi- ? nate of xi ’s embedding, possibly scaled by ?k , i.e. ek (xi ) = vik ) and the embedding n ¨ for a new example can be estimated using the Nystrom formula (Bengio et al., 2004): ?n 1 ek (x) = ?k i=1 vki KD (x, xi ) for the k-th coordinate of x, where ?k is the k-th eigen- ? value of M (the optional scaling by ?k would also apply). The above equation says n that the embedding for a new example x is a local interpolation of the manifold coor- (x,x dinates of its neighbors xi , with interpolating weights given by KD?k i ) . To see more clearly how the tangent plane may depend only on the neighbors of x, consider the re- lation between the tangent plane and the embedding function: the tangent plane at x is k (x) simply the subspace spanned by the vectors ?e?x . In the case of very “local” kernels like that of LLE, spectral clustering with Gaussian kernel, Laplacian Eigenmaps or kernel PCA with Gaussian kernel, that derivative only depends signi?cantly on the near neigh- k (x) bors of x. Consider for example kernel PCA with a Gaussian kernel: then ?e?x can be closely approximated by a linear combination of the difference vectors (x ? x j ) for xj near x. The weights of that combination may depend on the whole data set, but if the ambiant space has many more dimensions then the number of such “near” neighbors of x, this is a very strong locally determined constraint on the shape of the manifold. The case of Isomap is less obvious but we show below that it is also local. Let D(a, b) denote the graph geodesic distance going only through a, b and points from the training set. It is easy to imagine at least four failure causes for local manifold learning methods, and combining them will create even greater problems: • Noise around the manifold: data are not exactly lying on the manifold. In the case of non-linear manifolds, the presence of noise means that more data around each pancake region will be needed to properly estimate the tangent directions of the manifold in that region. • High curvature of the manifold. Local manifold learning methods basically approxi- mate the manifold by the union of many locally linear patches. For this to work, there must be at least d close enough examples in each patch (more with noise). With a high curvature manifold, more – smaller – patches will be needed, and the number of required patches will grow exponentially with the dimensionality of the manifold. Consider for example the manifold of translations of a high-contrast image.The tangent direction corresponds to the change in image due a small translation, i.e. it is non-zero only at edges in the image. After a one-pixel translation, the edges have moved by one pixel, and may not overlap much with the edges of the original image if it had high contrast. This is indeed a very high curvature manifold. • High intrinsic dimension of the manifold. We have already seen that high manifold dimensionality d is hurtful because O(d) examples are required in each patch and O(r d ) patches (for some r depending on curvature and noise) are necessary to span the manifold. In the translation example, if the image resolution is increased, then many more training images will be needed to capture the curvature around the translation manifold with locally linear patches. Yet the physical phenomenon responsible for translation is expressed by a simple equation, which does not get more complicated with increasing resolution. • Presence of many manifolds with little data per manifold. In many real-world con- texts there is not just one global manifold but a large number of manifolds which however share something about their structure. A simple example is the manifold of transforma- tions (view-point, position, lighting,...) of 3D objects in 2D images. There is one manifold per object instance (corresponding to the successive application of small amounts of all of these transformations). If there are only a few examples for each such class then it is almost impossible to learn the manifold structures using only local manifold learning. However, if the manifold structures are generated by a common underlying phenomenon then a non-local manifold learning method could potentially learn all of these manifolds and even generalize to manifolds for which a single instance is observed, as demonstrated in the experiments below.

More »
Similar Ebook PDF
Simulations, Games and Experiential Learning Techniques A TERMINAL KEYBOARD EXPERIENCE IN EXECUTIVE GAMING
sbaweb.wayne.edu - 94.93 KB - 8 page(s)

Simulations, Games and Experiential Learning Techniques:, Volume 1, 1974 A TERMINAL KEYBOARD EXPERIENCE IN EXECUTIVE GAMING Ralph H. Sprague, Jr., ...

Organizing Instruction and Study to Improve Student Learning a Practice Guide
ies.ed.gov - 946.57 KB - 63 page(s)

a pathway that represents a more or less coherent and comprehensive approach to a multifaceted problem. Practice guides are similar to the products of typical expert consensus panels in reflecting ...

The Use of Computer and Video Games for Learning
gmedia.glos.ac.uk - 264.73 KB - 93 page(s)

ISBN 1-85338-904-8 This publication was supported by the Learning and Skills Council as part of a grant to the Learning and Skills Development Agency for a programme ...

Social Bookmarking Tools as Facilitators of Learning and Research Collaborative Processes
ijello.org - 359.49 KB - 17 page(s)

web sites, blogs, pictures, wikis, videos and podcasts. Also emphasized are the advantages for learning and collaborative research that SBS produce. In this paper, Diigo will be specifically studied for its ...

The Depression Anxiety Stress Scales (DASS):Normative Data and Latent Structure in a Large non-Clinical Sample
abdn.ac.uk - 267.19 KB - 21 page(s)

analysis. Correlational analysis was used to determine the influence of demographic variables on DASS scores. The convergent and discriminant validity of the measure was examined through correlating the measure with ...