i'm working on project in c# uses principal component analysis apply feature reduction/dimension reduction on [,]matrix. matrix columns features (words , bigrams) have been extracted set emails. in beginning had around 156 emails resulted in approximately 23000 terms , worked supposed using following code:
public static double[,] getpcacomponents(double[,] sourcematrix, int dimensions = 20, analysismethod method = analysismethod.center) { // create principal component analysis of given source principalcomponentanalysis pca = new principalcomponentanalysis(sourcematrix, method); // compute principal component analysis pca.compute(); // creates projection of information double[,] pcacomponents = pca.transform(sourcematrix, dimensions); // return pca components return pcacomponents; }
the components received classified later on using linear discriminant analysis' classify method accord.net framework. working should.
now have increased size of out dataset (1519 emails , 68375 terms) @ first getting outofmemory exceptions. able solve adjusting parts of our code until able reach part calculate pca components. right takes 45 minutes way long. after checking the website of accord.net on pca decided try , use last example uses covariance matrix since says: "some users analyze huge amounts of data. in case, computing svd directly on data result in memory exceptions or excessive computing times". therefore changed our code following:
public static double[,] getpcacomponents(double[,] sourcematrix, int dimensions = 20, analysismethod method = analysismethod.center) { // compute mean vector double[] mean = accord.statistics.tools.mean(sourcematrix); // compute covariance matrix double[,] covariance = accord.statistics.tools.covariance(sourcematrix, mean); // create analysis using covariance matrix var pca = principalcomponentanalysis.fromcovariancematrix(mean, covariance); // compute principal component analysis pca.compute(); // creates projection of information double[,] pcacomponents = pca.transform(sourcematrix, dimensions); // return pca components return pcacomponents; }
this raises system.outofmemoryexception. know how solve problem?
Comments
Post a Comment