记我今天用 Python 干的一件小事

大清早的收到我弟弟的求助:

就这个地址,进去搜索XXX,出来过后点10个赞 http://my-h5news.app.xinhuanet.com/h5activity/huihuangbainian/vote-pc.html 用python做个自动点赞的简单不?

我很想说,你说呢?肯定是简单的。但是问题的关键是这个页面会不会有加密,协议复杂与否,所以,我说我还是先分析一下。

打开我熟悉的谷歌浏览器,开启控制台,开干,试着投票。发现投票就是个AJAX请求,那就简单了,点了"REPLAY"看能不能重复播放,得到了肯定的答案,有戏!

把请求拷贝为curl命令再试了一下,仍然ok,不错。

curl 'http://my-api.app.xinhuanet.com//vote/vote' \
  -H 'Proxy-Connection: keep-alive' \
  -H 'Accept: application/json, text/javascript, */*; q=0.01' \
  -H 'device-token: 1627981566669_2684478571015833_h5' \
  -H 'platform: h5' \
  -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36' \
  -H 'Content-Type: application/x-www-form-urlencoded; charset=UTF-8' \
  -H 'Origin: http://my-h5news.app.xinhuanet.com' \
  -H 'Referer: http://my-h5news.app.xinhuanet.com/' \
  -H 'Accept-Language: zh-CN,zh;q=0.9,en;q=0.8' \
  --data-raw 'voteUuid=8d3a88ba9120401d9bcec085c30fd3f9&optionId=2' \
  --compressed \
  --insecure

但是这个要做成许多人都可以用的花,那个device-token看上去挺随机的,就麻烦了噻。但是看上去也没太多麻烦,用下划线分割成了三部分,第一部分像个时间戳,试了下果然是,不错。第二个看上去像随机字符串,贸然生成怕被封,怎么生成的呢? 浏览他的js代码、分析下通信协议吧,没想到很简单,直接就找到了生成方式,那就简单了,直接生成个16个数字的字符串就可以了。

function create_deviceToken() {
    var time = new Date().getTime();
    var random = ('0000000000000000' + Math.floor(Math.random() * 9999999999999999)).slice(-16);
    var platform = 'h5';
    if (tools.is_wxBrowser()) {
        platform = 'weixin';
    } else if (tools.is_QQBrowser()) {
        platform = 'qq';
    } else if (tools.is_wbBrowser()) {
        platform = 'weibo';
    }
    var deviceToken = time + '_' + random + '_' + platform;
    localStorage.setItem('deviceToken', deviceToken);
    return deviceToken;
}

最后就是写代码的功夫了,人生苦短,我用Python,写完不足10分钟:

import requests
import ipdb
import logging
import time

from datetime import datetime
import random

logger = logging.getLogger(__name__)

session = requests.Session()


def vote(uuid, oid, times=10, delay=None):
    ts = int(datetime.now().timestamp() * 1000) + random.randint(0, 1000)
    if delay is None:
        delay = (0, 5)
    rnds = "".join([str(random.randint(0, 10)) for _ in range(16)])
    platform = "h5"
    token = f"{ts}_{rnds}_{platform}"
    for i in range(times):
        logger.info(f"{token} voting the {i+1} times")
        headers = {
            "user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36",
            "content-type": "application/x-www-form-urlencoded; charset=UTF-8",
            "platform": platform,
            "device-token": token,
        }
        response = session.post(
            "https://my-api.app.xinhuanet.com//vote/vote",
            headers=headers,
            data={
                "voteUuid": uuid,
                "optionId": oid,
            },
        )

        # print(response.status_code, response.headers)
        res = response.json()
        # print(res)
        if res["data"]["result"] == 5:
            logger.info("full")
        else:
            logger.info("ok")
        time.sleep(random.randint(1, 4))


if __name__ == "__main__":
    logging.basicConfig(
        filename="vote.log",
        level=logging.INFO,
        format="[%(asctime)s] {%(pathname)s:%(lineno)d} %(levelname)s - %(message)s",
        datefmt="%H:%M:%S",
    )
    for i in range(10):
        logger.info(f"as people #{i+1}")
        vote("8d3a88ba9120401d9bcec085c30fd3f9", "2888", 10, (0, 5))
    # ipdb.set_trace()

想直接试试这个脚本的同志别忘了安装大名鼎鼎的requests包哦。

当然这次是碰到软柿子了,平时的通讯协议都是加密加签名的,那就麻烦许多。

好了,今天就说到这里,如果有对Python和爬虫技术感兴趣的朋友,别忘了关注我哦。

1
瞎叨叨,叨道道,倒掉刀。。。我不知道自己想说些什么,就想写点当时的想法。
加入
更多来自 叨叨刀道