在Python中实现Poisson的E测试

发布时间：2020-12-20 12:09:38 所属栏目：Python 来源：网络整理

导读：是否有针对Poissons的E-Tes??t的 Python实现？对于二项式,scipy将 Fisher’s Exact test作为stats.fisher_exact,对于高斯,scipy.stats将 Welch’s T-test作为ttest_ind.我似乎找不到任何Python实现来比较两个Poissons. For context look here For the algori

是否有针对Poissons的E-Tes??t的 Python实现？对于二项式,scipy将 Fisher’s Exact test作为stats.fisher_exact,对于高斯,scipy.stats将 Welch’s T-test作为ttest_ind.我似乎找不到任何Python实现来比较两个Poissons.

For context look here

For the algorithm look here

For R implementation look here

解决方法

这是一个开始.这实现了Gu et的三个测试. al 2008基于渐近正态分布,现在还有两个基于精确分布的条件检验.

如果计数不是太小(例如大于10或20),并且曝光时间不是非常不相等,则得分测试工作得相当好.对于较小的计数,结果可能有点保守或自由,而其他方法会更好.版本’sqrt’在他们的模拟中表现非常好,但是当它的作用时,可能比得分测试的功率要小一些.

'''Test for ratio of Poisson intensities in two independent samples

Author: Josef Perktold
License: BSD-3

destination statsmodels

'''


from __future__ import division
import numpy as np
from scipy import stats


# copied from statsmodels.stats.weightstats
def _zstat_generic2(value,std_diff,alternative):
    '''generic (normal) z-test to save typing

    can be used as ztest based on summary statistics
    '''
    zstat = value / std_diff
    if alternative in ['two-sided','2-sided','2s']:
        pvalue = stats.norm.sf(np.abs(zstat))*2
    elif alternative in ['larger','l']:
        pvalue = stats.norm.sf(zstat)
    elif alternative in ['smaller','s']:
        pvalue = stats.norm.cdf(zstat)
    else:
        raise ValueError('invalid alternative')
    return zstat,pvalue


def poisson_twosample(count1,exposure1,count2,exposure2,ratio_null=1,method='score',alternative='2-sided'):
    '''test for ratio of two sample Poisson intensities

    If the two Poisson rates are g1 and g2,then the Null hypothesis is

    H0: g1 / g2 = ratio_null

    against one of the following alternatives

    H1_2-sided: g1 / g2 != ratio_null
    H1_larger: g1 / g2 > ratio_null
    H1_smaller: g1 / g2 < ratio_null

    Parameters
    ----------
    count1: int
        Number of events in first sample
    exposure1: float
        Total exposure (time * subjects) in first sample
    count2: int
        Number of events in first sample
    exposure2: float
        Total exposure (time * subjects) in first sample
    ratio: float
        ratio of the two Poisson rates under the Null hypothesis. Default is 1.
    method: string
        Method for the test statistic and the p-value. Defaults to `'score'`.
        Current Methods are based on Gu et. al 2008
        Implemented are 'wald','score' and 'sqrt' based asymptotic normal
        distribution,and the exact conditional test 'exact-cond',and its mid-point
        version 'cond-midp',see Notes
    alternative : string
        The alternative hypothesis,H1,has to be one of the following

           'two-sided': H1: ratio of rates is not equal to ratio_null (default)
           'larger' :   H1: ratio of rates is larger than ratio_null
           'smaller' :  H1: ratio of rates is smaller than ratio_null

    Returns
    -------
    stat,pvalue two-sided

    not yet
    #results : Results instance
    #    The resulting test statistics and p-values are available as attributes.


    Notes
    -----
    'wald': method W1A,wald test,variance based on separate estimates
    'score': method W2A,score test,variance based on estimate under Null
    'wald-log': W3A
    'score-log' W4A
    'sqrt': W5A,based on variance stabilizing square root transformation
    'exact-cond': exact conditional test based on binomial distribution
    'cond-midp': midpoint-pvalue of exact conditional test

    The latter two are only verified for one-sided example.

    References
    ----------
    Gu,Ng,Tang,Schucany 2008: Testing the Ratio of Two Poisson Rates,Biometrical Journal 50 (2008) 2,2008

    '''

    # shortcut names
    y1,n1,y2,n2 = count1,exposure2
    d = n2 / n1
    r = ratio_null
    r_d = r / d

    if method in ['score']:
        stat = (y1 - y2 * r_d) / np.sqrt((y1 + y2) * r_d)
        dist = 'normal'
    elif method in ['wald']:
        stat = (y1 - y2 * r_d) / np.sqrt(y1 + y2 * r_d**2)
        dist = 'normal'
    elif method in ['sqrt']:
        stat = 2 * (np.sqrt(y1 + 3 / 8.) - np.sqrt((y2 + 3 / 8.) * r_d))
        stat /= np.sqrt(1 + r_d)
        dist = 'normal'
    elif method in ['exact-cond','cond-midp']:
        from statsmodels.stats import proportion
        bp = r_d / (1 + r_d)
        y_total = y1 + y2
        stat = None
        pvalue = proportion.binom_test(y1,y_total,prop=bp,alternative=alternative)
        if method in ['cond-midp']:
            # not inplace in case we still want binom pvalue
            pvalue = pvalue - 0.5 * stats.binom.pmf(y1,bp)

        dist = 'binomial'

    if dist == 'normal':
        return _zstat_generic2(stat,1,alternative)
    else:
        return stat,pvalue


from numpy.testing import assert_allclose

# testing against two examples in Gu et al

print('ntwo-sided')
# example 1
count1,n2 = 60,51477.5,30,54308.7

s1,pv1 = poisson_twosample(count1,n2,method='wald')
pv1r = 0.000356
assert_allclose(pv1,pv1r*2,rtol=0,atol=5e-6)
print('wald',s1,pv1 / 2)   # one sided in the "right" direction

s2,pv2 = poisson_twosample(count1,method='score')
pv2r = 0.000316
assert_allclose(pv2,pv2r*2,atol=5e-6)
print('score',s2,pv2 / 2)   # one sided in the "right" direction

s2,method='sqrt')
pv2r = 0.000285
assert_allclose(pv2,atol=5e-6)
print('sqrt',pv2 / 2)   # one sided in the "right" direction

print('ntwo-sided')
# example2
# I don't know why it's only 2.5 decimal agreement,rounding?
count1,n2 = 41,28010,15,19017
s1,method='wald',ratio_null=1.5)
pv1r = 0.2309
assert_allclose(pv1,atol=5e-3)
print('wald',ratio_null=1.5)
pv2r = 0.2398
assert_allclose(pv2,atol=5e-3)
print('score',method='sqrt',ratio_null=1.5)
pv2r = 0.2499
assert_allclose(pv2,atol=5e-3)
print('sqrt',pv2 / 2)   # one sided in the "right" direction

print('none-sided')
# example 1 onesided
count1,alternative='larger')
pv1r = 0.000356
assert_allclose(pv1,pv1r,pv1)   # one sided in the "right" direction

s2,alternative='larger')
pv2r = 0.000316
assert_allclose(pv2,pv2r,pv2)   # one sided in the "right" direction

s2,alternative='larger')
pv2r = 0.000285
assert_allclose(pv2,pv2)   # one sided in the "right" direction

# 'exact-cond','cond-midp'

s2,method='exact-cond',alternative='larger')
pv2r = 0.000428  # typo in Gu et al,switched pvalues between C and M
assert_allclose(pv2,atol=5e-4)
print('exact-cond',method='cond-midp',alternative='larger')
pv2r = 0.000310
assert_allclose(pv2,atol=5e-4)
print('cond-midp',pv2)   # one sided in the "right" direction


print('none-sided')
# example2 onesided
# I don't know why it's only 2.5 decimal agreement,ratio_null=1.5,alternative='larger')
pv1r = 0.2309
assert_allclose(pv1,atol=5e-4)
print('wald',alternative='larger')
pv2r = 0.2398
assert_allclose(pv2,atol=5e-4)
print('score',alternative='larger')
pv2r = 0.2499
assert_allclose(pv2,alternative='larger')
pv2r = 0.2913
assert_allclose(pv2,alternative='larger')
pv2r = 0.2450
assert_allclose(pv2,pv2)   # one sided in the "right" direction

这打印

two-sided /2
wald 3.38491255626 0.000356004664253
score 3.417401839 0.000316109441024
sqrt 3.44548501956 0.00028501778109

two-sided /2
wald 0.73544663636 0.231033764105
score 0.706630933035 0.239897930348
sqrt 0.674401392575 0.250028078819

one-sided
wald 3.38491255626 0.000356004664253
score 3.417401839 0.000316109441024
sqrt 3.44548501956 0.00028501778109

one-sided
wald 0.73544663636 0.231033764105
score 0.706630933035 0.239897930348
score 0.674401392575 0.250028078819

确切的条件测试相对容易实现,但非常保守且功耗低.大致精确的测试需要更多的努力(目前我没有时间).

(经常：实际的计算是几行.决定接口,添加文档和单元测试是更多的工作.)

编辑

上面的脚本现在还包括精确的条件测试和它的中点p值版本,用Gu等人的单侧替代的两个例子进行检查.

例1：单面的

exact-cond None 0.00042805269405
cond-midp None 0.000310132441983

例2：单面的

exact-cond None 0.291453753765
cond-midp None 0.245173718501

目前没有返回的条件测试的测试统计信息

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!