Limits of predictability in human mobility

来自集智百科
跳转到: 导航搜索

目录

Introduction

简介

研究问题:To what degree is human behavior predictable? 应用方面:city planning and resource management in mobile communications

研究内容

研究方法:通过研究匿名手机用户的移动模式,探索人类动态可预测性的局限性 explore the limits of predictability in human dynamics by studying the mobility patterns of anonymized mobile phone users

研究对象:【从100万匿名手机用户中挑选出出现在超过两个以上信号塔的】50000个人,并且这些人的平均通话频率f ≥0.5 hour−1 we use a 3-month-long record, collected for billing purposes and anonymized by the data source, capturing the mobility patterns of 50,000 individuals chosen from ~10 million anonymous mobile phone users with the criteria that they visit more than two locations (tower vicinity) during the observational period and that their average call frequency f is ≥0.5 hour−1

研究结果:我们发现在用户移动整个用户群93%的潜在可预报性 we find a 93% potential predictability in user mobility across the whole user base

Result

1A

1111.png

第一个用户在30公里地区的N = 22个信号塔附近移动,而第二次访问多达N = 76个塔,跨越约90公里的社区。 The first user moves in the vicinity of N = 22 towers in a 30-km region, whereas the second visits as many as N = 76 towers spanning approximately a 90-km neighborhood.

1B

2222.png

1B是1A的深化,刻画出每一个tower出现时间的占比 证明用户会在更少的区域(平均值为3平方公里)花费大多数时间

assigned to each user a mobility network in which nodes are the locations visited by the user ——> individuals tend to spend most of their time in a few selected locations

1C

3333.png

每个移动网络具有相关联的动态模式,这是由用户访问22个信号塔的时间序列(一周),其颜色与从呼叫所在的塔匹配。 这个位置序列作为我们的移动性预测的基础。

为1D埋下伏笔:长时间没有通话活动,在此期间我们没有关于用户位置的信息。

each mobility network has an associated dynamical pattern ——> the temporal sequence of towers visited by the user

A week-long call pattern that captures the time-dependent location of the user with N = 22. Each vertical line corresponds to a call, and its color matches the tower from where the call was placed. This sequence of locations serves as the basis of our mobility prediction.

相关熵

1、随机熵

Ni是the number of distinct locations visited by user I 目的是为了刻画:如果以相等的概率访问每个位置,捕获用户轨迹的可预测性程度

2、时间不相关的熵

pi(j)指用户i访问位置j的历史概率,表征访问模式的异质性

the historical probability that location j was visited by the user i, characterizing the heterogeneity of visitation patterns

3、真实熵 Si 取决于访问次数(the frequency of visitation)、访问节点的顺序以及在每个位置占用的时间(the order in which the nodes were visited and the time spent at each location) ——> 捕捉一个人的流动模式中存在的全部时空顺序

Ti是在每个连续小时间隔内观察用户i的塔的序列


是在轨迹Ti中找到特定的时间序列子序列Ti’的概率

  • 一般情况下,


在研究真实熵时,出现了问题:我们需要一个持续的时间序列,但是我们只能观测到用户出现在信号塔时的时间序列。


1D

用户往往把他们的大部分电话放在短时间内 The users tend to place most of their calls in short bursts

4444.png

跨越整个用户群体的连续呼叫τ之间的时间间隔的分布,记录呼叫模式的性质为突发 The distribution of the time intervals between consecutive calls τ, across the whole user population, documenting the nature of the call pattern as coming in bursts


1E

5555.png 收集的数据的这种不完整性由参数q捕获,当用户的位置对我们来说是未知的时候,它代表时长的间隔。 This incompleteness of the collected data is captured by the parameter q, representing the fraction of hour-long intervals when the user’s location is unknown to us.

我们的用户群中的P(q)在q = 0.7附近达到峰值,这表明,对于典型的用户,我们没有位置更新约70%的小时间隔,掩盖了用户的真实熵Si。 P(q) across our user base peaked around q = 0.7, which indicated that, for a typical user, we have no location update for about 70% of the hourly intervals, which masks the user’s real entropy Si.

为了保证S(q)的准确性,我们在测试100个用户轨迹的准确性后,发现它对q <0.8表现良好,占我们数据集中92%的用户。 因此,我们从数据集中删除了最高q的5000个用户,这确保了剩余的45,000个用户满足q <0.8。 We therefore studied the dependence of the entropy S(q) on the incompleteness q, which allowed us to extrapolate the entropy to q = 0. We tested the method’s accuracy on the trajectory of 100 users whose whereabouts were recorded every hour and found that it performed well for q < 0.8, which represented 92% of the users in our data set. We therefore removed 5000 users with the highest q from our data set, which ensured that all remaining 45,000 users satisfied q < 0.8


对应(少图)

2A

6666.png P(S)与P(Srand)相比很不一样。P(Srand)在S rand≈6处达到峰值,这表示平均来说,用户位置的每次更新代表每小时6位; 也就是说,随机选择他下一个位置可能在2Srand≈64个位置中找到。 相反地,在S = 0.8时达到P(S)峰值,这表明用户可能出现的位置个数不是64,而是20.8=1.74 < 2个

The most striking result is the prominent shift of P(S) compared with P(Srand). Indeed, P(S rand) peaks at S rand ≈ 6, which indicates that, on average, each update of the user’s location represents six bits per hour of new information; that is, a user who chooses randomly his or her next location could be found on average in any of 2Srand ≈ 64 locations. 

 rg:每个用户的回转半径,呈现厚尾分布(fat-tailed distribution) each user’s radius of gyration 现象:虽然大多数人的日常活动仅限于1至10公里的有限区域,但有几个用户经常会覆盖数百公里 ——> 发现:出现位置较少的人应该容易预测(小熵),而具有大回转半径的人,不可预测可能性更高(高熵)。


∏:适当的预测算法 正确地预测用户未来的轨迹 的可能性

This quantity is subject to Fano’s inequality 666.png

777.png

对于∏max = 0.2的用户,这意味着个人至少80%的时间以似乎是随机的方式选择他的位置,只有在剩下的20%的时间内,我们才能预测他/她 下落。 换句话说,无论我们的预测算法多么好,我们无法预测∏max = 0.2的用户将来的下落精度优于20%。 因此,∏max代表每个人的可预测性的限制。

Snip20171024 26.png

https://stackoverflow.com/questions/46905044/runtimeerror-in-solving-equation-using-sympy

Snip20171024 27.png

2B

7777.png ∏max ≈ 0.93时P(∏max) 达到峰值,

这种高度有限的分布表明,尽管个体轨迹的显着随机性,用户的日常流动性模式的历史记录隐藏了潜在的高可预测性。 我们还确定了从Sunc和Srand提取的最大可预测性∏unc和随机可预测性∏rand。

结果显着不同,P(∏unc)在∏unc〜0.3处达到峰值,表明如果仅依赖于异质空间分布,整个群体的预测能力是渺茫的,(因为每个人都不一样)。

类似地,P(∏rand)在∏rand = 0处具有峰值,这表明∏rand和∏unc不仅作为预测工具是不具有缺陷的,而且可预测性的重要份额以访问模式的时间顺序被编码。

2C

8888.png 可预测性Πmax对用户出现半径Rg的依赖性,用以捕获每个用户定期回转半径。对于rg> 10 km,Πmax在很大程度上与rg无关联性,饱和Πmax≈0.93。

2D

9999.png 用户在n个最常访问的位置中花费的时间Π(波浪线),所得到的度量Π(波浪线)表示可预测性Πmax的上限。 因此,对于n = 1,我们可以在最可能的位置(“家”)找到用户的位置时正确地预测,而对于n = 2,我们可以正确地预测用户在他的最常去的两个地点(“家”或“办公室”)

Π(波浪线)随n的对数增长而增长。

3A

10000.png

R:每个用户去某个信号塔的规律,定义为在该小时内(一周时间,共计7*24=168小时)在他访问最多的地点找到用户的概率。 【R表示可预测性Π的下限,因为它忽略了用户移动性中的时间相关性。】

一周内,每小时的规律性R(t),测量用户在相关时间段内访问他最常访问的地点时的比例。

  • 星期一上午8点至9点,用户在塔1被发现10次,在塔2被发现两次,一次在塔3,我们认为在这一小时内,最可能的位置将是塔1。我们发现,在整个用户群中,R≈0.7,这意味着平均来说,访问次数最多的时间的70%与用户的实际位置一致。
  • 在夜间,当大多数人倾向于在家中可靠时,R峰值≈0.9,但中午12点至下午1时。 下午6点到7点,R有明显的最小值,对应于过渡期(在回家或吃饭的路上)。

3B

1100.png

在一个星期内每个小时时间段内的访问位置N(t)的平均数量,表明R(t)与N(t)相关。

我们发现低规律性R的时刻对应于N(t)的显着增加,高移动性的签名,并且当R峰值存在N(t)的下降时。

3C

1200.png

<R / Rrand>与回转半径(rg)的关系,表明rg大的用户具有较高的相对规律性。

Rrand = 1/N,比观察到的R≈0.7小一个数量级

这个差距再次表明,表征每个用户移动性的高规则性与其将是随机的期望显着偏离。 This gap once again indicates that the high regularity characterizing each user’s mobility represents a significant departure from the expectation that they will be random.

结论

In summary, the combination of the empirically determined user entropy and Fano’s inequality indicates that there is a potential 93% average predictability in user mobility, an exceptionally high value rooted in the inherent regularity of human behavior. Yet it is not the 93% predictability that we find the most surprising. Rather, it is the lack of variability in predictability across the population. Indeed, given the fat-tailed distribution of the distances over which users travel on a regular basis, most individuals are well localized in a finite neighborhood, but a few travel widely. Furthermore, a number of demographic and external parameters, from age to population density and the number of towers visited, vary widely from user to user. It is not unreasonable to expect, therefore, that predictability should also vary widely: For people who travel little, it should be easier to foresee their location, whereas those who regularly cover hundreds of kilometers should have a low predictability. Despite this inherent population heterogeneity, the maximal predictability varies very little—indeed P(Πmax) is narrowly peaked at 93%, and we see no users whose predictability would be under 80%.

Although making explicit predictions on user whereabouts is beyond our goals here, appropriate data-mining algorithms could turn the predictability identified in our study into actual mobility predictions. Most important, our results indicate that when it comes to processes driven by human mobility, from epidemic modeling to urban planning and traffic engineering, the development of accurate predictive models is a scientifically grounded possibility, with potential impact on our well-being and public health. At a more fundamental level, they also indicate that, despite our deep-rooted desire for change and spontaneity, our daily mobility is, in fact, characterized by a deep-rooted regularity.

相关页面

ReadingList

个人工具
名字空间
操作
导航
工具箱