Each principal component sums up a certain percentage of the total variation in the dataset. Where many variables correlate with one another, they will all contribute strongly to the same principal component. In this way, you transform a set of x correlated variables over y samples to a set of p uncorrelated principal components over the same samples.
This linear transformation fits this dataset to a new coordinate system in such a way that the most significant variance is found on the first coordinate, and each subsequent coordinate is orthogonal to the last and has a lesser variance. PCA is a type of linear transformation on a given data set that has values for a certain number of variables (coordinates) for a certain amount of spaces. By searchinig for this eigenvectors with high eigenvalues we can hopefully reduce the dimensionality of the dataset.
A PCA looks for correlations among the columns by searching for vectors (eigenvectors) that correlate stongly with the data in the columns (high eigenvalues). PCA techniques are very useful for data exploration when the dataset is ‘wide’, there are a lot of columns for the amount of rows of datapoints. I am setting up a notebook for how to run principal component analyses.