numpy 常用工具函数 —— np bincount/np average

发布时间：2020-12-14 04:38:44 所属栏目：大数据来源：网络整理

导读：a href=“http://blog.csdn.net/lanchunhui/article/details/50072453”,target="_blank"numpy 常用api（一） a href=“http://blog.csdn.net/lanchunhui/article/details/50429205”,target="_blank"numpy 常用api（二）一个函数提供 random_state 的关键

<a href=“http://blog.csdn.net/lanchunhui/article/details/50072453”,target="_blank">numpy 常用api（一）
<a href=“http://blog.csdn.net/lanchunhui/article/details/50429205”,target="_blank">numpy 常用api（二）

一个函数提供 random_state 的关键字参数（keyword parameter）：是为了结果的可再现性（reoccurrence）或叫可重复性。

1. np.bincount()：统计次数

接口为：

numpy.bincount(x,weights=None,minlength=None)

尤其适用于计算数据集的标签列（y_train）的分布（distribution），也即获得 class distribution ：

>>> np.bincount(y_train.astype(np.int32))

>>> np.bincount(np.array([0,1,3,2,7]))
array([1,1],dtype=int32)
			# 分别统计0-7分别出现的次数

If weights is specified the input array is weighted by it,i.e. if a value n is found at position i,out[n] += weight[i] instead of out[n] += 1.

>>> w = np.array([0.3,0.5,0.2,0.7,1.,-0.6]) # weights
>>> x = np.array([0,2])
>>> np.bincount(x,w)
array([ 0.3,0.4,0.7])
			# 0: 0.3
			# 1:0.5+0.2
			# 2: 1+(-0.6)
			# 3: 0.7

np.bincount() 从零开始计数（不允许序列中出现负数）；

>>> np.bincount([3,4,5])
array([0,dtype=int32)
							# 分别表示0出现的次数，
							# 1出现的次数，
							# 2出现的次数，
							# 。。。

2. np.average()

np.average(X,axis=0,weights=w) == w.dot(X)

等式左部表示加权平均，sum(w)==1时才有意义，也即等式的左部比等式的右部多了一层加权平均的意义，内积代表着实现该意义的动作。

X = np.array([[.9,.1],[.8,.2],[.4,.6]])
w = np.array([.2,.2,.6])
print(w.dot(X))
print(np.average(X,weights=w))

在一些情况下**只能使用np.average()**而无法使用简单的矩阵乘法操作：
比如：

P = np.asarray([c.predict_proba(X) for c in clfs])
							# 此时P是一个三维矩阵
							# (# of clfs) * (# of samples) * (# of classes)
np.average(P,weights=w)
							# 此时的shape为 ((# of samples) * (# of classes))
							# 仍然维持行和为1

也有一些情况下只能使用 np.average 而无法使用dot（矩阵乘法，matrix multiplication）运算：

def predict_proba(self,X):
	probas = np.asarray([clf.predict_proba(X) for clf in self.classifiers_])
	# return self.weights.dot(probas)
				# 此时self.weights有未赋值的风险
				# None类型肯定是不支持dot函数的
	return np.average(probas,weights=self.weights)
				# np.average的功能便是，如果weights参数为None
				# 就执行正常的求平均操作

再分享一下我老师大神的人工智能教程吧。零基础！通俗易懂！风趣幽默！还带黄段子！希望你也加入到我们人工智能的队伍中来！http://www.captainbed.net

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!