banner
Alexeisie

AlexEisie

啊? Email: alexeisie@brs.red
github

Viewing Telethon from Scratch: Building a Telegram Album/File Crawler (Telegram Album and Replies Crawler on Telethon)

1. What is Telethon#

Telethon 是一个用于与 Telegram API 进行交互的 Python 3 MTProto 库,可以作为用户或通过机器人账号(机器人 API 替代方案)使用。

2. Disadvantages of Telethon#

下载死慢,官方 API 限速

3. Telethon Official Documentation#

注:你可以在官方开发文档中获取到一些本文所讲述的内容,如有疑问可以先查询该文档。
https://docs.telethon.dev/en/stable/
pip 包名:telethon

4. Learning Telethon Client#

公式化代码块

api_id = 123456789
api_hash = 'a1b2c3d4'
phone = '+11145141919'
#telegram client connection
client = TelegramClient(phone, api_id, api_hash)
client.connect()
if not client.is_user_authorized():
    client.send_code_request(phone)
    client.sign_in(phone, input('Enter the code: '))

5. Learning Telethon Channel#

In Telethon, each Channel is a chat entity, and Channels can be divided into two main types: broadcast and megagroup, corresponding to "Channels" and "Discussion Groups" respectively.
We can use client(GetDialogsRequest(offset_date=last_date,offset_id=0,offset_peer=InputPeerEmpty(),limit=chunk_size,hash=0)).chats to get a collection of all chat entities.
For each Chat object, there are several important members:
image

  • id is used to obtain the unique identifier
  • title is used to obtain the Channel name
  • When broadcast is True, it means that the chat is a "Channel"
  • When megagroup is True, it means that the chat is a "Discussion Group"
  • The following example code is used to filter out all "Channels" and let the user select the target chat
chats = []
last_date = None
chunk_size = 200
channels = []
result = client(GetDialogsRequest(
    offset_date=last_date,
    offset_id=0,
    offset_peer=InputPeerEmpty(),
    limit=chunk_size,
    hash=0
))
chats.extend(result.chats)
for chat in chats:
    try:
        if chat.broadcast == True:
            channels.append(chat)
    except:
        continue
print('Choose a group to scrape members from:')
i = 0
for c in channels:
    print(str(i) + '- ' + c.title)
    i += 1
c_index = input("Enter a Number: ")
target_channel = channels[int(c_index)]

After obtaining the chat object, we can use client.iter_messages(target_channel, ***) to get the list of messages in the chat. Here are some parameters that are helpful for this article:

  • entity specifies the chat entity
  • limit (int | None, optional) is the maximum limit of historical messages to be obtained. When the value is None, all content will be obtained
  • offset_id (int) is the initial offset of the message ID to be obtained. After using it, only the list of messages with IDs less than that ID will be obtained (the larger the message ID, the newer it is)
  • max_id (int)
  • min_id (int)
  • search (str) is the search string
  • filter (MessagesFilter | type) is the message type filter, the available values can be found at https://tl.telethon.dev/types/messages_filter.html
    image
  • reverse (bool, optional) The default order of retrieval is from large to small IDs (from old to new). When the value is True, it will be retrieved from small to large
  • reply_to (int, optional) When this value is set, it will retrieve the collection of reply (comment) messages corresponding to the message ID. When this value is set, the filter and search will be invalid.
    Note: It can only be used to retrieve messages from Channels and Discussion Groups, and it will be invalid for general chats and private channels.

Here, I will also introduce an outdated way of retrieving messages, which has advantages in retrieving messages by ID. client.get_messages(entity,***ids) When ids is set, it will directly return the message object corresponding to the ids instead of a collection.

6. Learning Message in Telethon#

The Message type is very important, mostly because we are working with a library for a messaging platform, so messages are widely used: in events, when fetching history, replies, etc.
It bases ChatGetter and SenderGetter.

In Telethon, each message is a Message object.
There are many members in Message, and here I will pick a few member variables and functions that are helpful for this article.

Hey, I haven't finished writing yet.

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.