移动数据挖掘

来自集智百科
跳转到: 导航搜索

目录

Radius of gyration

Radius of gyration or gyradius of a body about an axis of rotation is defined as the radial distance of a point from the axis of rotation at which, if whole mass of the body is assumed to be concentrated, its moment of inertia about the given axis would be the same as with its actual distribution of mass. It is denoted by R_g.

Mathematically the radius of gyration is the root mean square distance of the object's parts from either its center of mass or a given axis, depending on the relevant application. It is actually the perpendicular distance from point mass to the axis of rotation.

https://en.wikipedia.org/wiki/Radius_of_gyration

Center of Mass

The center of mass is the unique point at the center of a distribution of mass in space that has the property that the weighted position vectors relative to this point sum to zero. In analogy to statistics, the center of mass is the mean location of a distribution of mass in space.

https://en.wikipedia.org/wiki/Center_of_mass

Doc2Vec

采用doc2vec也可以算出来一个doc的vec,其物理意义不明确,猜测它与center of mass是类似的。因此,基于两种方法算出来的radius of gyration是强正相关的。

A system of particles

In the case of a system of particles Pi, i = 1, …, n , each with mass mi that are located in space with coordinates ri, i = 1, …, n , the coordinates R of the center of mass satisfy the condition

 \sum_{i=1}^n m_i(\mathbf{r}_i - \mathbf{R}) = 0.

Solving this equation for R yields the formula

\mathbf{R} = \frac 1M \sum_{i=1}^n m_i \mathbf{r}_i,

where M is the sum of the masses of all of the particles.

Center of Mass & Docvec

Center of mass docvec.jpg Rg mass rg docvec.jpg

我们的数据发现采用质心的公式根据wordvec计算的结果与docvec存在弱相关,相关系数的均值为0.145。根据两种方法计算的回转半径Rg之间具有强相关,相关系数约为0.64。

Knowledge Space

from urlparse import urlparse
 
# clean url
def urlclean(url):
    try:
        url = urlparse(url).hostname
        if url.replace('.','').isdigit(): return 'none'
        else:
            if len(url.split('.')) >=2 :
                if url[-6:]=='com.cn': return '.'.join(url.split('.')[-3:])
                return '.'.join(url.split('.')[-2:])
    except:
        return 'none'

Communication Network

Strongly Connected Components

Ego Network

Node Centrality Analysis

Degree, PageRank, Triangle Count

Predict Users' CONSUME_AMT

Mobility Network

Jump Size

Radius of Gyration

Preferential Return

Predictability/Entropy

https://en.wikipedia.org/wiki/Approximate_entropy

https://stackoverflow.com/questions/46296891/entropy-estimator-based-on-the-lempel-ziv-algorithm-using-python

https://cn.mathworks.com/matlabcentral/fileexchange/51042-entropy-estimator-based-on-the-lempel-ziv-algorithm?s_tid=prof_contriblnk

function E=lzentropy(rd)
 
n=length(rd);
L=zeros(1,n);
L(1)=1;
 
for i=2:n
 
    sub=rd(i);              
 
    match=rd(1:i-1)==sub;   
 
    if all(match==0)==1     
        L(i)=1;
    else                    
        k=1;
 
        while k<i  
 
            if i+k>n      
                L(i)=0;
                break
            end
 
            sub=rd(i:i+k);  
 
            for j=1:i-1      
 
                match=rd(j:j+length(sub)-1)==sub;
 
                if all(match==1)==1
                    break;
                end
            end
 
            L(i)=length(sub);
            if all(match==1)==0
                k=i;
            end
            k=k+1;
        end
    end
end
 
E=1/(1/n * sum(L))*log(n);
 
end


Python Script

def contains(small, big):
    for i in range(len(big)-len(small)):
        if big[i] == small[0]:
            if big[i:i+len(small)] == small:
                return True
    return False
 
def contains_sublist(lst, sublst):
    n = len(sublst)
    return any((sublst == lst[i:i+n]) for i in xrange(len(lst)-n+1))
 
 
def actual_entropy(l):
    n = len(l)
    sequence = [l[0]]
    sum_gamma = 0
 
    starttime = time.time()
    for i in range(1, n):
        if i % 1000 == 0:
            print(i)
            endtime = time.time()
            print(endtime - starttime)
            starttime = time.time()
 
        for j in range(i+1, n+1):
            s = list(l[i:j])
#             print(list(l[i:j]))
#             print('sequence', sequence, s, '\n')
            if contains(s, sequence) != True:
#                 print('gamma_i', len(s), '\n')
                sum_gamma += len(s)
                sequence.append(l[i])
                break
 
#     print(sum_gamma)
    ae = 1 / (sum_gamma / n ) * math.log(n)            
    return ae

Researchers

作者:Zhicongchen

个人工具
名字空间
操作
导航
工具箱