pandas处理csv数据,提取同种类列汇总。
例如你现在有很多分.csv数据(100份),而且每份的数据格式都一样,
host | ip | title | domain | region | link | product_category | lastupdatetime |
47.96.115.191 | 二氢青蒿酸_香叶木素原料_磷酸替米考星现货供应_南京秋石医药科技有限公司 | Zhejiang | 服务,脚本语言,中间件,脚本语言 | ###### | |||
38.55.38.128 | 天阿萨大大撒旦 | California | 其他企业应用,服务,中间件,脚本语言 | ###### | |||
183.61.241.31 | 中寰交通网校 | Guangdong | 服务,中间件,脚本语言,开发框架,脚本语言 | ###### | |||
67.201.3.195 | emc全站网页版 - emc全站网页下载 | Arizona | 服务,脚本语言,中间件 | ###### | |||
23.81.4.196 | bat365在线平台登录网址 - bat365在线平台网站 | Washington | 服务,中间件,脚本语言 | ###### | |||
38.63.251.248 | 香港老凤祥黄金首饰价格走势图 | California | 服务,中间件,脚本语言 | ###### |
你现在想把某一种类的数据汇总到一起,
你想要把这一百分中每一份的link这一列的数据提取出来汇总。
link |
代码如下
import pandas as pd
import os
# 读取所有CSV文件并提取链接写入到links.txt
directory = '/root/桌面/8W/' # 将目录路径替换为你实际的目录路径
with open('links.txt', 'w') as f:
for filename in os.listdir(directory):
if filename.endswith(".csv"):
filepath = os.path.join(directory, filename)
print(f"Reading file: {filepath}")
df = pd.read_csv(filepath, sep=',', encoding='utf-8') # 使用逗号作为分隔符
if 'link' in df.columns: # 检查是否存在link列
for link in df['link']:
f.write(link + '\n') # 将链接写入到文件中
print(f"Links extracted from {filename} and written to links.txt")
运行
└─# python3 1.py
Reading file: /root/桌面/8W/c3e5614998_202405132137资产数据.csv
Links extracted from c3e5614998_202405132137资产数据.csv and written to links.txt
Reading file: /root/桌面/8W/948089af51_202405132226资产数据.csv
Links extracted from 948089af51_202405132226资产数据.csv and written to links.txt
Reading file: /root/桌面/8W/7339e72143_202405132140资产数据.csv
Links extracted from 7339e72143_202405132140资产数据.csv and written to links.txt
Reading file: /root/桌面/8W/2bbab7ac2a_202405132131资产数据.csv
Links extracted from 2bbab7ac2a_202405132131资产数据.csv and written to links.txt
Reading file: /root/桌面/8W/94f3b4aab7_202405132138资产数据.csv
......处理完后查看link.txt即可。
-.-
评论区