本文共计604个字,预计阅读时长需要2分钟。

python parse 提取复杂的内容

https://github.com/r1chardj0n3s/parse

import parse

# nginx 一条日志
log = '162.158.167.131 - - [11/Aug/2020:06:47:30 +0800] "GET /tags/Tenacity HTTP/1.1" 301 194 "-" "Mozilla/5.0 (Linux; Android 7.0;) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; PetalBot;+https://aspiegel.com/petalbot)"'
pattern = '{ip} - - [{dt:th}] "{method} {path} HTTP/1.1" {code:d} {length:d} "-" "{ua}"'
result = parse.search(pattern, log)
print(result['ip'])
print(result['ua'])
print(result.named)

只要我们自己系统的日志,按照统一的规范来写,那么也可以非常轻易地提取出来

2020-08-11 13:21:41 [scrapy.extensions.logstats] INFO: [多次失败] https://xxx.com/aa/bb\n

pattern = '[多次失败] {url}\n'