Numpy包数据处理

1. Numpy包导入

1
>>> import numpy as np

2. 生成矩阵

  • zeros 0矩阵
  • ones 1矩阵
  • identity 单位矩阵
  • eye 对角矩阵
  • random 随机矩阵
  • empty 依据内存生成矩阵
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
>>> a = np.array([1, 2, 3])
>>> print a
[1 2 3]
>>> np.zeros(5)
array([ 0., 0., 0., 0., 0.])
>>> np.ones(shape=(3,4),dtype=np.int32)
array([[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]], dtype=int32)
>>> print np.identity(3)
[[ 1. 0. 0.]
[ 0. 1. 0.]
[ 0. 0. 1.]]
>>> print np.eye(3,5)
[[ 1. 0. 0. 0. 0.]
[ 0. 1. 0. 0. 0.]
[ 0. 0. 1. 0. 0.]]
>>> print np.random.rand(3,4)
[[ 0.40974346 0.09968249 0.32264537 0.78815716]
[ 0.52959935 0.74650259 0.36604339 0.80450681]
[ 0.07308854 0.83747867 0.18952243 0.03182399]]
>>> x = np.empty(shape=(3,3),dtype=np.int16)
>>> print x
[[ 6088 5994 32645]
[ 0 -16048 630]
[ 0 0 32]]

3. 矩阵特征

  • X.flags 存储情况信息
  • X.shape (行数、列数)
  • X.ndim 维数
  • X.size 数组中元素的数量(x*y)
  • X.itemsize 数组中的数据项的所占内存空间大小
  • X.dtype 数据类型
  • X.T 如果X是矩阵,发挥的是X的转置矩阵
  • X.trace() 计算X的迹
  • det 行列式
  • eig 特征值和特征向量
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
>>> b = np.array([[0,2,4],[1,3,5]])
>>> print b
array([[0, 2, 4],
[1, 3, 5]])
>>> b.flags
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : True
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
>>> b.shape
(2, 3)
>>> b.ndim
2
>>> b.size
6
>>> b.itemsize
8
>>> b.dtype
dtype('int64')
>>> b.T
array([[0, 1],
[2, 3],
[4, 5]])
>>> b.trace()
3
>>> s
array([[ 0.76028676, 0.26199634, 0.8378446 ],
[ 0.07305935, 0.9685931 , 0.25006689],
[ 0.39322084, 0.90543664, 0.11268395]])
>>> np.linalg.det(s)
-0.32924369300392159
>>> np.linalg.eig(s)
(array([-0.31667895, 0.72581365, 1.4324291 ]), array([[ 0.59088385, -0.93187254, -0.77379983],
[ 0.1215846 , 0.35540744, -0.39068828],
[-0.79754213, -0.07279501, -0.49859452]]))

4. 矩阵运算

注意:矩阵乘法使用dot运算

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
>>> a = array( [20,30,40,50] )
>>> b = arange( 4 )
>>> b
array([0, 1, 2, 3])
>>> c = a-b
>>> c
array([20, 29, 38, 47])
>>> b**2
array([0, 1, 4, 9])
>>> 10*sin(a)
array([ 9.12945251, -9.88031624, 7.4511316 , -2.62374854])
>>> a<35
array([True, True, False, False], dtype=bool)
>>> A = array( [[1,1],
... [0,1]] )
>>> B = array( [[2,0],
... [3,4]] )
>>> A*B # 元素乘积
array([[2, 0],
[0, 4]])
>>> dot(A,B) # 矩阵乘法
array([[5, 4],
[3, 4]])

5. 通用操作

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
>>> a = np.random.rand(1,3)
>>> b = np.random.rand(2,3)
>>> a
array([[ 0.86752326, 0.83817954, 0.70940074]])
>>> b
array([[ 0.59104081, 0.71738646, 0.48867363],
[ 0.92321008, 0.57634452, 0.36543018]])
>>> b[0] # 取0行元素
array([ 0.45238927, 0.42093218, 0.28423796])
>>> b[:,0] # 取0列元素
array([ 0.45238927, 0.39131834])
>>> x = np.random.rand(10,1)
>>> x
array([[ 0.61574666],
[ 0.35264373],
[ 0.03036087],
[ 0.50497023],
[ 0.9401023 ],
[ 0.84207028],
[ 0.28167327],
[ 0.23913086],
[ 0.72833365],
[ 0.38438902]])
>>> x[2:5] # 取2到5行
array([[ 0.03036087],
[ 0.50497023],
[ 0.9401023 ]])
>>> y
array([[392828856, 32645, 392828856, 32645, 0],
[ 0, 24, 0, 0, 27],
[ 0, 0, 160, 0, 160],
[ 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0]], dtype=int32)
>>> y[1:4,1:4] # 1~4行,1~4列元素
array([[ 24, 0, 0],
[ 0, 160, 0],
[ 0, 0, 0]], dtype=int32)
>>> n.sum()
5.6976490969576563
>>> n.mean()
0.47480409141313801
>>> n.sum(axis=0)
array([ 2.05174571, 1.64697796, 1.47652237, 0.52240306])

6. 矩阵重构操作

  • vstack
  • hstack
  • reshape
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
>>> a = np.random.rand(1,3)
>>> b = np.random.rand(2,3)
>>> a
array([[ 0.93525054, 0.69674357, 0.58304333]])
>>> b
array([[ 0.69230576, 0.98160659, 0.82468763],
[ 0.71006004, 0.21784347, 0.8019069 ]])
>>> np.concatenate([a,a])
array([[ 0.93525054, 0.69674357, 0.58304333],
[ 0.93525054, 0.69674357, 0.58304333]])
>>> np.vstack([a,b])
array([[ 0.93525054, 0.69674357, 0.58304333],
[ 0.69230576, 0.98160659, 0.82468763],
[ 0.71006004, 0.21784347, 0.8019069 ]])
>>> np.hstack([b,b])
array([[ 0.69230576, 0.98160659, 0.82468763, 0.69230576, 0.98160659,
0.82468763],
[ 0.71006004, 0.21784347, 0.8019069 , 0.71006004, 0.21784347,
0.8019069 ]])
>>> a.shape
(1, 3)
>>> a.reshape(3,1)
array([[ 0.93525054],
[ 0.69674357],
[ 0.58304333]])

7. 参考

[1] http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-otebooks/blob/master/numpy/numpy.ipynb
[2] http://www.xuebuyuan.com/1910480.html
[3] http://www.tuicool.com/articles/r2yyei

坚持原创技术分享,您的支持将鼓励我继续创作!