lua – 使用浮点数或双精度数而不是整数

发布时间：2020-12-14 21:49:51 所属栏目：大数据来源：网络整理

导读：我知道Lua的默认实现仅使用浮点数,从而避免了在选择要使用的数学函数的哪个变量之前动态确定数字的子类型的问题. 我的问题是 – 如果我尝试在标准C99中将整数模拟为双精度(或浮点数),是否有一种可靠(简单)的方法来说明精确表示的最大值是什么？我的意思是,

我知道Lua的默认实现仅使用浮点数,从而避免了在选择要使用的数学函数的哪个变量之前动态确定数字的子类型的问题.

我的问题是 – 如果我尝试在标准C99中将整数模拟为双精度(或浮点数),是否有一种可靠(简单)的方法来说明精确表示的最大值是什么？

我的意思是,如果我使用64位浮点数来表示整数,我当然不能代表所有64位整数(这里的鸽子原则适用).如何判断可表示的最大整数？

(尝试列出所有值不是解决方案 – 例如,如果我在64位架构中使用双打,因为我必须列出2 ^ {64}个数字)

谢谢！

解决方法

对于64位双精度,最大的可表示整数是253(9007199254740992),对于32位浮点数,最大的可表示整数是224(16777216).请参阅 the Wikipedia page for IEEE floating point numbers的基准数字.

在Lua中验证这一点非常简单：

local maxdouble = 2^53

-- one less than the maximum can be represented precisely
print (string.format("%.0f",maxdouble-1)) --> 9007199254740991
-- the maximum itself can be represented precisely
print (string.format("%.0f",maxdouble))   --> 9007199254740992
-- one more than the maximum gets rounded down
print (string.format("%.0f",maxdouble+1)) --> 9007199254740992 again

如果我们没有方便的IEEE定义的字段大小,知道我们对浮点数设计的了解,我们可以使用可能值的简单循环来确定这些值：

#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#define min(a,b) (a < b ? a : b)
#define bits(type) (sizeof(type) * 8)
#define testimax(test_t) { 
  uintmax_t in = 1,out = 2; 
  size_t pow = 0,limit = min(bits(test_t),bits(uintmax_t)); 
  while (pow < limit && out == in + 1) { 
    in = in << 1; 
    out = (test_t) in + 1; 
    ++pow; 
  } 
  if (pow == limit) 
    puts(#test_t " is as precise as longest integer type"); 
  else printf(#test_t " conversion imprecise for 2^%d+1:n" 
    "   in: %llun  out: %llunn",pow,in + 1,out); 
}

int main(void)
{
    testimax(float);
    testimax(double);
    return 0;
}

The output of the above code：

float conversion imprecise for 2^24+1:
   in: 16777217
  out: 16777216

double conversion imprecise for 2^53+1:
   in: 9007199254740993
  out: 9007199254740992

当然,由于浮点精度的工作方式,64位双精度可以表示远大于264的数字,因为浮动指数增长为正. The Wikipedia page on double-precision floating-point描述：

Between 2⁵²=4,503,599,627,370,496 and 2⁵³=9,007,199,254,740,992 the representable numbers are exactly the integers. For the next range,from 2⁵³ to 2⁵⁴,everything is multiplied by 2,so the representable numbers are the even ones,etc. Conversely,for the previous range from 2⁵¹ to 2⁵²,the spacing is 0.5,etc.

double可以容纳的绝对最大值列在该页面的下方：0x7fefffffffffffff,其计算为(1(1 – 2-52))* 21023,或大致为1.7976931348623157e308.

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!