leiem.cn

2024-09-04

[TOC]

Series

Series是一个一维的标记数组，可以存储任何数据类型。

创建Series

import pandas as pd

s = pd.Series(['a', 'b', 'c'])
print(s)

# 输出
0    a
1    b
2    c
dtype: object

Series索引

索引是用于标识每个数据元素的标签，默认情况下pandas会创建从0开始的整数索引，可以通过index参数来指定自定义索引。
1
2
3
4
5
6
7
8
s = pd.Series(['a', 'b', 'c'], index=['V1', 'V2', 'V3'])
print(s)

# 输出
V1 a
V2 b
V3 c
dtype: object

常用操作

通过索引取值

s = pd.Series(['a', 'b', 'c'], index=['V1', 'V2', 'V3'])
print(s['V2'])
# 输出
b

DataFrame

DataFrame是二维的表格数据结果，可以看作是Series的集合，它有行索引和列名。

创建Dataframe

import pandas as pd

df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
})
print(df)
# 输出
      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago

查看前N行数据

print(df.head(2))
# 输出
    Name  Age         City
0  Alice   25     New York
1    Bob   30  Los Angeles

查看最后N行数据

print(df.tail(2))
# 输出
      Name  Age         City
1      Bob   30  Los Angeles
2  Charlie   35      Chicago

获取单列的数据（返回Series）

print(df['Name'])
# 输出
0      Alice
1        Bob
2    Charlie
Name: Name, dtype: object

过滤数据
- df.iloc[1]
  
  iloc基于位置取第2行的数据，输出
  1
  2
  3
  4
  Name Bob
  Age 30
  City Los Angeles
  Name: 1, dtype: object
- df.iloc[1, 0]
  
  iloc基于位置取第2行第1列的值，行与列都是从0开始计与索引和列名无关，输出
  1
  Bob
- df.loc[1]
  
  loc基于标签取索引为1那一行的数据，输出
  1
  2
  3
  4
  Name Bob
  Age 30
  City Los Angeles
  Name: 1, dtype: object
- df.loc[1, 'Name']
  
  loc基于标签取索引为1（默认索引是整数从0开始）那一行的Name列的值，输出
  1
  Bob
- df.loc[df['Age'] > 25]
  
  过滤Age列的值大于25的所有数据，输出
  1
  2
  3
  Name Age City
  1 Bob 30 Los Angeles
  2 Charlie 35 Chicago