panda内有两种数据结构,Series()和DataFrame()
1 >>> a=pd.Series([1,2],index=['a','b']) 2 >>> a3 a 14 b 25 dtype: int64
1 >>> b.index2 RangeIndex(start=0, stop=2, step=1)3 >>> b.values4 array(['b', 'a'], dtype=object)5 >>> a/26 a 0.57 b 1.08 dtype: float649 >>>
列表切分选择
>>> s[0:3:2]a 2c 6dtype: int64
1 s3=pd.Series(arr) 另一种方式生成series 2 >>> s3 3 0 1 4 1 2 5 2 3 6 3 4 7 dtype: int32 8 >>> s3=pd.Series(s) 9 >>> s310 a 211 b 512 c 613 d 314 dtype: int6415 >>> s[s>8]16 Series([], dtype: int64)17 >>> s18 a 219 b 520 c 621 d 322 dtype: int6423 >>> s[s>3] 找出>3的元素24 b 525 c 626 dtype: int6427 >>> np.log(s) 对series直接运用函数28 a 0.69314729 b 1.60943830 c 1.79175931 d 1.09861232 dtype: float6433 >>> s.isin([5,6]) 看某些元素是否在series中,boolean值34 a False35 b True36 c True37 d False38 dtype: bool39 >>> s[s.isin([5,6])]40 b 541 c 642 dtype: int6443 >>> s2=pd.Series([5,2,np.NaN,7,np.NaN])44 >>> s245 0 5.046 1 2.047 2 NaN48 3 7.049 4 NaN50 dtype: float6451 >>> s2.isnull()52 0 False53 1 False54 2 True55 3 False56 4 True57 dtype: bool58 >>> s2.notnull()59 0 True60 1 True61 2 False62 3 True63 4 False64 dtype: bool
>>> s2[s2.isnull()]2 NaN4 NaNdtype: float64
Frame的使用
1 frame2=pd.DataFrame(fram,columns=['name','age']) 2 >>> frame2 3 name age 4 red 1 2 5 yellow 5 6 6 blue 9 10 7 black 13 14 8 >>> frame2.values 9 array([[ 1, 2],10 [ 5, 6],11 [ 9, 10],12 [13, 14]])13 >>> frame2.index14 Index([u'red', u'yellow', u'blue', u'black'], dtype='object')15 >>> frame2.columns16 Index([u'name', u'age'], dtype='object')17 >>> frame2['name']18 red 119 yellow 520 blue 921 black 1322 Name: name, dtype: int3223 >>> frame2.name24 red 125 yellow 526 blue 927 black 1328 Name: name, dtype: int3229 >>> frame2.age30 red 231 yellow 632 blue 1033 black 1434 Name: age, dtype: int3235 >>> frame2[index=['red']]
>>> frame2[0:2] name agered 1 2yellow 5 6>>> frame2['name'][2]9
1 >>> s.idxmin()2 'a'3 >>> s.idxmax9)4 SyntaxError: invalid syntax5 >>> s.idxmax()6 'c'7 >>> s.index.is_unique8 True
>>> fram id name age homered 0 1 2 3yellow 4 5 6 7blue 8 9 10 11black 12 13 14 15>>> frame4=fram.drop(['name','age'],axis=1) 删除列>>> frame4 id homered 0 3yellow 4 7blue 8 11black 12 15
1 >>> f=lambda x:x.max()-x.min() 对frame运用自定义函数 2 >>> fram.apply(f) 3 id 12 4 name 12 5 age 12 6 home 12 7 dtype: int64 8 >>> fram.apply(f,axis=1) 9 red 310 yellow 311 blue 312 black 313 dtype: int6414 >>> fram.apply(f,axis=0)15 id 1216 name 1217 age 1218 home 1219 dtype: int6420 >>> def f(x):21 return pd.Series([x.min(),x.max()],index=['min','max'])22 23 >>> fram.apply(f)24 id name age home25 min 0 1 2 326 max 12 13 14 15
frame的一些数学统计值
1 >>> fram.describe() 2 id name age home 3 count 4.000000 4.000000 4.000000 4.000000 4 mean 6.000000 7.000000 8.000000 9.000000 5 std 5.163978 5.163978 5.163978 5.163978 6 min 0.000000 1.000000 2.000000 3.000000 7 25% 3.000000 4.000000 5.000000 6.000000 8 50% 6.000000 7.000000 8.000000 9.000000 9 75% 9.000000 10.000000 11.000000 12.00000010 max 12.000000 13.000000 14.000000 15.00000011 >>> fram.sum()12 id 2413 name 2814 age 3215 home 3616 dtype: int6417 >>> fram.mean()18 id 6.019 name 7.020 age 8.021 home 9.022 dtype: float6423 >>> fram.min()24 id 025 name 126 age 227 home 328 dtype: int32