如何使用Python(scikit-learn)计算FactorAnalysis得分?
发布时间:2020-12-20 13:51:34 所属栏目:Python 来源:网络整理
导读:我需要进行探索性因子分析,并使用 Python计算每个观察的分数,假设只有1个潜在因素.似乎sklearn.decomposition.FactorAnalysis()是要走的路,但遗憾的是 documentation和 example(遗憾的是我无法找到其他例子)对我来说还不够清楚如何完成工作. 我有以下测试文
我需要进行探索性因子分析,并使用
Python计算每个观察的分数,假设只有1个潜在因素.似乎sklearn.decomposition.FactorAnalysis()是要走的路,但遗憾的是
documentation和
example(遗憾的是我无法找到其他例子)对我来说还不够清楚如何完成工作.
我有以下测试文件,包含29个29变量的观察结果(test.csv): 49.6,34917,24325.4,305,101350,98678,254.8,276.9,47.5,1,3,5.6,3.59,11.9,97.5,97.6,8,10,100,96.93,610.1,1718.22,6.7,28,5 275.8,14667,11114.4,775,75002,74677,30,109,9.1,6.5,3.01,8.2,1558,2063.17,5.5,64,5 2.3,9372.5,8035.4,4.6,8111,8200,8.01,130,1.2,5,3.33,6.09,97.9,67.3,342.3,99.96,18.3,53,1457.27,4.8,4 7.10,13198.0,13266.4,1.1,708,695,6.1,80,0.4,4,3.1,97.8,45,82.7,99.68,4.5,13.8,3 1.97,2466.7,2900.6,19.7,5358,5335,10.1,23,0.5,2,3.14,97.3,97.2,9,74.5,98.2,99.64,79.8,54,1367.89,6.4,12,4 2.40,2999.4,2218.2,0.80,2045,2100,8.9,1.5,2.82,8.6,97.4,47.2,323.8,99.996,13.6,24,1249.67,2.7,3 0.59,4120.8,5314.5,0.54,14680,13688,14.9,117,2.94,3.4,97.7,11.8,872.6,9.3,52,1251.67,14,2 0.72,2067.7,2364,367,298,7.2,60,2.5,2.97,10.5,74.7,186.8,99.13,57,1800.45,2 1.14,2751.9,3066.8,3.5,1429,1498,7.7,1.6,2.86,76.7,240.1,99.93,1259.97,15,3 1.29,4802.6,5026.1,7859,7789,1.9,98,34,297.5,99.95,1306.44,8.5,4 0.40,639.0,660.3,1.3,25,0.1,94.2,4.3,50,1565.44,19.2,4 0.26,430.7,608.1,33,7,6,76.5,98.31,1490.08,4 4.99,2141.2,2357.6,3.60,339,320,8.1,0.2,5.9,58.1,206.3,99.58,13.2,95,1122.92,14.2,2 0.36,1453.7,1362.2,3.50,796,785,3.7,98.1,91.4,214.6,99.74,7.5,1751.98,11.5,1657.5,2421.1,2.8,722,690,11,37.4,404.2,99.98,10.9,35,1772.33,10.2,3 1.14,5635.2,5649.6,2681,2530,5.4,20,0.3,50.1,384.7,99.02,11.6,27,1306.08,16,2 0.6,1055.9,1487.9,69,65,63,137.9,5.1,48,1595.06,4 0.08,795.3,1174.7,1.40,85,76,2.2,39.3,149.3,98.27,1903.9,2 0.90,2514.0,2644.4,2.6,1173,1104,43,0.8,58.7,170.5,80.29,1292.72,2 0.27,870.4,949.7,1.8,252,240,31,64.5,6.6,29,1483.18,3 0.41,1295.1,2052.3,2.60,2248,2135,6.0,71.1,261.3,91.86,21,1221.71,9.4,4 1.10,3544.2,4268.9,2.1,735,730,1.7,317.2,99.62,9.8,46,1271.63,3 0.22,899.3,888.2,1.80,220,218,3.6,22.5,70.79,10.6,32,1508.02,4 0.24,1712.8,1735.5,1.30,41,3.28,16.6,720.2,1324.46,2 0.2,558.4,631.9,60.7,99.38,1535.08,2 0.21,599.9,1029,70,85.7,48.6,221.2,40,1381.44,25.6,2 0.10,131.3,190.6,2.9,58.9,189.4,6.9,42,1525.58,17.4,3 0.44,3881.4,5067.3,0.9,2732,2500,11.2,2.67,14.5,1326.2,99.06,1120.54,10.3,2 0.18,1024.8,1651.3,1.01,358,345,15.9,790.2,1531.04,3 0.46,682.9,784.2,103,166.3,44,1373.6,13.5,2 0.12,370.4,420.0,1.10,2.57,51.6,120,99.85,1297.94,3 0.03,552.4,555.1,49,33.6,594.5,3.2,1184.34,3 0.21,1256.5,2434.8,1265,1138,6.3,20.1,881,99.1,3.9,1265.93,7.8,3 0.09,320.6,745.7,37,49.2,376.4,39,1285.11,3 0.08,452.7,570.9,18,4.7,0.6,2.45,97.1,19.9,1103.8,22,1562.61,21.9,3 0.13,967.9,947.2,74,4.0,1.4,30.1,503.1,99.999,55,1269.33,2 0.07,495.0,570.3,3.62,13,29.8,430.5,99.7,4.9,1461.79,14.6,2 0.17,681.9,537.4,113,98.3,74.3,1290.16,3 0.05,639.7,898.2,0.40,3.0,1221.1,1372,4 0.65,2067.8,2084.2,2.50,414,398,7.3,0.7,2.16,60.1,146.3,10.4,1059.68,7.4,804.4,1416.4,3.30,579,602,4.2,2492.3,95.4,1345.76,2 使用我根据官方示例和this post编写的代码 from sklearn import decomposition,preprocessing from sklearn.cross_validation import cross_val_score import csv import numpy as np data = np.genfromtxt('test.csv',delimiter=',') def compute_scores(X): n_components = np.arange(0,len(X),1) X = preprocessing.scale(X) # data normalisation attempt pca = decomposition.PCA() fa = decomposition.FactorAnalysis(n_components=1) pca_scores,fa_scores = [],[] for n in n_components: pca.n_components = n fa.n_components = n #pca_scores.append(np.mean(cross_val_score(pca,X))) # if I attempt to compute pca_scores I get the error. fa_scores.append(np.mean(cross_val_score(fa,X))) print pca_scores,fa_scores compute_scores(data) 代码输出: [],[-947738125363.77405,-947738145459.86035,-947738159924.70471,-947738174662.89746,-947738206142.62854,-947738179314.44739,-947738220921.50684,-947738223447.3678,-947738277298.33545,-947738383772.58606,-947738415104.84912,-947738406361.44482,-947738394379.30359,-947738456528.69275,-947738501001.14319,-947738991338.98291,-947739381280.06506,-947739389033.33557,-947739434992.48047,-947739549511.2655,-947739355699.70959,-947739879828.51514,-947739898216.39099,-947739905804.71033,-947739902618.47791,-947738564594.54639,-948816122907.87366,-947744046601.55029,-947738624937.61292,-947738625325.73486,-947738626111.14441,-947738624973.92188,-947738625200.06946,-947738625568.65027,-947738625528.69666,-947738625359.41992,-947738624906.67529,-947738625652.12439,-947739509002.01868,-947738625426.81946,-947738625380.45837] 这个结果远非预期的结果.这是此任务的R代码和相同的数据.它的输出正常(结果接近某些能够执行FA的IBM程序的输出): data <-read.csv("test.csv",header=F) col_names <- names(data) drops <- c() for (name in col_names){ st_dev <- sd(data[,name],na.rm = T) if (st_dev == 0){ drops <- c(drops,name) } } da_nal <- data[,!(names(data) %in% drops)] factanal(na.omit(da_nal),factors = 1,scores = 'regression')$scores 此代码的输出是: Factor1 1 4.89102190 2 3.65004187 3 0.14628700 4 -0.20255897 5 -0.01565570 6 -0.16438863 7 0.40835986 8 -0.25823984 9 -0.20813064 10 0.09390067 11 -0.28891296 12 -0.28882753 13 -0.26624358 14 -0.25202275 15 -0.25181326 16 -0.15653679 17 -0.28702281 18 -0.28865654 19 -0.23251509 20 -0.28066125 21 -0.18714387 22 -0.24969113 23 -0.28302552 24 -0.28712610 25 -0.29196529 26 -0.28659988 27 -0.29502523 28 -0.15802910 29 -0.27440118 30 -0.29083667 31 -0.29548220 32 -0.29461059 33 -0.23594859 34 -0.29654336 35 -0.29759659 36 -0.29085001 37 -0.29539071 38 -0.29234303 39 -0.29702103 40 -0.27595130 41 -0.27184361 所以我希望在Python中获得类似的结果(我知道我不会得到确切的数字),但我不知道如何. 解决方法
似乎我想出了如何获得分数.
from sklearn import decomposition,preprocessing import numpy as np data = np.genfromtxt('rangir_test.csv',') data = data[~np.isnan(data).any(axis=1)] data_normal = preprocessing.scale(data) fa = decomposition.FactorAnalysis(n_components = 1) fa.fit(data_normal) for score in fa.score_samples(data_normal): print score 不幸的是,输出(见下文)与factanal()的输出非常不同.任何有关分解的建议.FactorAnalysis()将不胜感激. Scikit-learn分数输出: -69.8587183816 -116.353511148 -24.1529840248 -36.5366398005 -7.87165586175 -24.9012815104 -23.9148486368 -10.047780535 -4.03376369723 -7.07428842783 -7.44222705099 -6.25705487929 -13.2313513762 -13.3253819521 -9.23993173528 -7.141616656 -5.57915693405 -6.82400483045 -15.0906961724 -3.37447211233 -5.41032267015 -5.75224753811 -19.7230390792 -6.75268922909 -4.04911793705 -10.6062761691 -3.17417070498 -9.95916350005 -3.25893428094 -3.88566777358 -3.30908856716 -3.58141292341 -3.90778368669 -4.01462493538 -11.6683969455 -5.30068548445 -24.3400870389 -7.66035331181 -13.8321672858 -8.93461397086 -17.4068326999 (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |
推荐文章
站长推荐
热点阅读