c# - How to solve OutOfMemoryException that is thrown using principal component analysis -


i'm working on project in c# uses principal component analysis apply feature reduction/dimension reduction on [,]matrix. matrix columns features (words , bigrams) have been extracted set emails. in beginning had around 156 emails resulted in approximately 23000 terms , worked supposed using following code:

public static double[,] getpcacomponents(double[,] sourcematrix, int dimensions = 20, analysismethod method = analysismethod.center)  {     // create principal component analysis of given source     principalcomponentanalysis pca = new principalcomponentanalysis(sourcematrix, method);      // compute principal component analysis     pca.compute();      // creates projection of information     double[,] pcacomponents = pca.transform(sourcematrix, dimensions);      // return pca components     return pcacomponents; } 

the components received classified later on using linear discriminant analysis' classify method accord.net framework. working should.

now have increased size of out dataset (1519 emails , 68375 terms) @ first getting outofmemory exceptions. able solve adjusting parts of our code until able reach part calculate pca components. right takes 45 minutes way long. after checking the website of accord.net on pca decided try , use last example uses covariance matrix since says: "some users analyze huge amounts of data. in case, computing svd directly on data result in memory exceptions or excessive computing times". therefore changed our code following:

public static double[,] getpcacomponents(double[,] sourcematrix, int dimensions = 20, analysismethod method = analysismethod.center)      {         // compute mean vector         double[] mean = accord.statistics.tools.mean(sourcematrix);          // compute covariance matrix         double[,] covariance = accord.statistics.tools.covariance(sourcematrix, mean);          // create analysis using covariance matrix         var pca = principalcomponentanalysis.fromcovariancematrix(mean, covariance);          // compute principal component analysis         pca.compute();          // creates projection of information         double[,] pcacomponents = pca.transform(sourcematrix, dimensions);          // return pca components         return pcacomponents;     } 

this raises system.outofmemoryexception. know how solve problem?

i think parallelizing solver best bet.

perhaps feast help.

http://www.ecs.umass.edu/~polizzi/feast/

parallel linear algebra multicore system


Comments