NonNegativeMatrixFactorization-

Non-Negative Matrix FactorizationMarshall Tappen6.8991Problem StatementGiven a set of images:1.Create a set of basis images that can be linearly combined to create new images2.Find the set of weights to reproduce every input image from the basis imagesOne set of weights for each input image 23 ways to do this discussedVector QuantizationPrincipal Components AnalysisNon-negative Matrix FactorizationEach method optimizes a different aspect3Vector QuantizationThe reconstructed image is the basis image that is closest to the input image.4Whats wrong with VQ?Limited by the number of basis imagesNot very useful for analysis5PCAFind a set of orthogonal basis imagesThe reconstructed image is a linear combination of the basis images 6What dont we like about PCA?PCA involves adding up some basis images then subtracting othersBasis images arent physically intuitiveSubtracting doesnt make sense in context of some applicationsHow do you subtract a face?What does subtraction mean in the context of document classification?7Non-negative Matrix FactorizationLike PCA, except the coefficients in the linear combination cannot be negative8NMF Basis ImagesOnly allowing adding of basis images makes intuitive senseHas physical analogue in neuronsForcing the reconstruction coefficients to be positive leads to nice basis imagesTo reconstruct images, all you can do is add in more basis imagesThis leads to basis images that represent parts910PCA vs NMFPCADesigned for producing optimal (in some sense) basis imagesJust because its optimal doesnt mean its good for your applicationNMFDesigned for producing coefficients with a specific propertyForcing coefficients to behave induces “nice” basis imagesNo SI unit for “nice”11The cool ideaBy constraining the weights, we can control how the basis images wind upIn this case, constraining the weights leads to “parts-based” basis images12Objective functionLet the value of a pixel in an original input image be V. Let (WH)i be the reconstructed pixel.If we consider V to be a noisy version of (WH)i , then the PDF of V isNow we will maximize the log probability of this PDF over W and H, leaving the relevant objective function to be:13How do we derive the update rules?This is in the NIPS paper.(Im going to change the error function toto match the NIPS paper)Use gradient descent to find a local minimumThe gradient descent update rule is: 14Deriving Update RulesGradient Descent Rule:Set The update rule becomes 15Whats significant about this?This is a multiplicative update instead of an additive update.If the initial values of W and H are all non-negative, then the W and H can never become negative.This lets us produce a non-negative factorization(See NIPS Paper for full proof that this will converge)16How do we know that this will converge?If F is the objective function, let be G be an auxiliary functionIf G is an auxiliary function of F, then F is non-increasing under the update17Auxiliary Function18How do we know that this will converge?Let the auxiliary function be Then the update isWhich results in the update rule:19Main ContributionsIdea that representations which allow negative weights do not make sense in some applicationsA simple system for producing basis images with non-negative weightsPoints out that this leads to basis images that are based on partsA larger point here is that by constraining the problem in new ways, we can induce nice properties20Mels CommentaryMost significant point:“NMFs non-negativity constraint is well-matched to our intuitive ideas about decomposition into parts”Second: Basis images are betterThird: Simple System21Mels CaveatsComparison of NMF to PCA is subjectiveBasis images dont necessarily correspond to parts as we think of them.Subtraction may actually occur in the brainSome neurons are “negative” versions of others22A Quick Review of Linear AlgebraEvery vector can be expressed as the linear combination of basis vectorsCan think of images as big vectors (Raster scan image into vector)This means we can express an image as the linear combination of a set of basis images23Where do those update rules come from?How this is a matrix problemBasis Images (W)XReconstruction Weights (H)Weights used to combine basis imagesInto reconstructed pixelThe basis pixels that will be combined into the reconstructed pixel24