Countless times, James Bond had guided my thoughts and taught me lessons I would never forget. However, I take pride in the fact that, on one occasion, I managed to help him out of a rather intricate situation.
I happened upon James standing in the middle of a busy road, completely oblivious to the torrent of cars swerving dangerously around him.
“What are you doing there?” I called out, alarmed.
“Hello,” he replied absently, “I’m reading a book by the Greek philosopher Zeno.”
“Please, get out of the traffic! Take care of yourself!”
“That’s impossible,” he said calmly, his gaze fixed on the book. “Listen! To traverse a path, you must first cover half of it. To do that, you must cover a quarter of the way, and then an eighth, and so on, infinitely. A person cannot perform an infinite number of actions, and thus, movement itself is impossible.”
“Let’s discuss this after you’ve moved from that dangerous spot!” I urged.
“But I can’t find any mathematical contradictions in Zeno’s argument.”
At that point, I lost my patience and shouted more forcefully than I intended, “Are you crazy? Get out of there!”
“It seems you’re asking for the impossible, Sir,” James replied with a calm dignity.
I quickly changed my approach and appealed to his common sense. “I can confidently assure you that movement is possible. I’ve experienced it a number of times, and you can too. Just take a step!”
With evident caution, he moved forward slowly, carefully stepping off the road.
“I’m used to trusting mathematical predictions rather than feelings or intuition,” James said, a hint of embarrassment in his voice.
“I think we still need to balance abstract concepts with common sense,” I said, giving him a reassuring pat on the shoulder.
“Right,” he acknowledged thoughtfully. “By the way, the PCA method we discussed so intensively often yields quite counterintuitive results that must be significantly reinterpreted with a dose of common sense…”
From Abstraction to Common Sense
We continued our conversation at my place, where James began to speculate.
“Suppose we have a mixture of substance A and substance B, each with its own unique spectral signature. To keep things simple, let’s assume A and B are distributed randomly in equal proportions. As usual, we apply PCA—in this case, the uncentered version—and reduce the dimensionality of the dataset down to two. That is, we extract two spectral components that can describe any data distribution in this mixture.”
“Since there are only two variables, we can easily plot the data distribution over these two components. The data variation would form a straight line, as the content changes linearly from A to B. But what does this 2D plot mean?”
James quickly sketched the results of the PCA.
“This is a space where each point represents a spectral signature, and all of these signatures are linear combinations of just two ‘basic’ spectra—the signatures of the two principal components.”
“Now, look more closely at this 2D set of spectral signatures. Most of them are physically impossible, as a spectral feature cannot be negative. Even the spectral signature of the second principal component falls into the forbidden domain.”
“This is a catastrophe! How can we use PCA after that?” I exclaimed.
“Steady! We still can,” James replied. “We can always find a linear combination of principal components that adheres to all physical constraints. In this particular case, we can simply look at the data distribution in the 2D plot and manually select two points at the edges of the data spread. Let’s call them endmembers. Since all points in our 2D space are linear combinations of one another, we can express them in terms of these two reference points, instead of the abstract principal components. These new basis points would correspond very closely to the ground truth we set in the simulations.”
“Great! But why do we need principal components if we end up recasting everything into other qualities anyway?”
“Ah… that’s more of a technical issue. But we can take another approach. Let’s start by searching for two reference points that 1) satisfy physical constraints, such as non-negativity, and 2) ensure non-negative fractions in all available data points. This method is called Non-negative Matrix Factorization (NMF).”
Bond typed something into the computer, and after a brief moment, a satisfied smile crossed his face.
“The results are quite similar to what I obtained with PCA followed by endmembering. However, the algorithms behind NMF are more complex and less robust than PCA. So, using good old PCA might not be such a bad idea after all. And, I should warn you that the NMF solution might be not unique.”
“Doesn’t matter!” I said enthusiastically. “From now on, I will adopt this wonderful approach: search only for solutions that are non-negative.”
Back to Abstraction
“Wait a moment!” James said thoughtfully. After a brief pause, he added, “I think the restriction of strictly non-negative fractions in each data point could, at least partially, be relaxed.”
“Negative content?” I asked, trying to be as gentle as possible. “James, are you starting to feel unwell again, like when you were crossing that street?”
“Not at all,” Bond smiled. “And I’m going to show you that this idea is consistent with both common sense and mathematics.”
“Imagine that substance A is absent over a large portion of the dataset. The content there must be zero, and indeed, NMF would produce zero. However, if the data is corrupted by significant noise, the output can’t be exactly zero. Of course, PCA, NMF, and other dimensionality reduction methods reduce noise, but they never eliminate it completely. In the region we’re discussing, the content shouldn’t be zero but should instead fluctuate randomly around zero.”
“If we insist on strict non-negativity of the content, the position of reference point A must be shifted from the ground truth. The bias increases with the noise level. Even more so, for the same levels of noise, the bias would vary randomly depending on the largest outlier produced by the noise.”
“Are there methods to account for noise in this situation?”
“There are,” James replied. “I’ll need to head back to headquarters to check it. You know, my colleagues are unmatched experts in creating noise and thriving in noisy environments…”
The Python codes can be found in the pdf version of this document: Full Text with Codes.
Leave a Reply