Medium has a large amount of content, a large number of users, and an almost overwhelming number of posts. When you try to find interesting users to interact with, you’re flooded with visual noise.
I define an interesting user as someone who is from your network, who is active, and who writes responses that are generally appreciated by the Medium community.
I was looking through the latest posts from users I follow to see who had responded to those users. I figured that if they responded to someone I’m following, they must have similar interests to mine.
The process was tedious. And that’s when I remembered the most valuable lesson I learned during my last internship:
Any tedious task can and should be automated.
I wanted my automation to do the following things:
I initially looked at Medium’s API, but found it limiting. It didn’t give me much to work with. I could only get information about my account, not on other users.
On top of that, the last change to Medium’s API was over a year ago. There was no sign of recent development.
I realized that I would have to rely on HTTP requests to get my data, so I started to poke around using my Chrome DevTools.
The first goal was to get my list of Followings.
I opened up my DevTools and went on the Network tab. I filtered out everything but XHR to see where Medium gets my list of Followings from. I hit the reload button on my profile page and got nothing interesting.
What if I clicked the Followings button on my profile? Bingo.
Inside the link, I found a very big JSON response. It was a well-formatted JSON, except for a string of characters at the beginning of the response: ])}while(1);</x>
I wrote a function to clean that up and turn the JSON into a Python dictionary.
import json
def clean_json_response(response):
return json.loads(response.text.split('])}while(1);</x>')[1])
I had found an entry point. Let the coding begin.
To query that endpoint, I needed my User ID (I know that I already had it, but this is for educational purposes).
While looking for a way to get a user’s ID, I found out that you can add ?format=json
to most Medium URLs to get a JSON response from that page. I tried that out on my profile page.
Oh look, there’s the user ID.
])}while(1);</x>{"success":true,"payload":{"user":{"userId":"d540942266d0","name":"Radu Raicea","username":"Radu_Raicea",
...
I wrote a function to pull the user ID from a given username. Again, I had to use clean_json_response
to remove the unwanted characters at the beginning of the response.
I also made a constant called MEDIUM
that contains the base for all the Medium URLs.
import requests
MEDIUM = 'https://medium.com'
def get_user_id(username):
print('Retrieving user ID...')
url = MEDIUM + '/@' + username + '?format=json'
response = requests.get(url)
response_dict = clean_json_response(response)
return response_dict['payload']['user']['userId']
With the User ID, I queried the /_/api/users/<user_id>/following
endpoint and got the list of usernames from my Followings list.
When I did it in DevTools, I noticed that the JSON response only had eight usernames. Weird.
After I clicked on “Show more people,” I saw what was missing. Medium uses pagination for the list of Followings.
Pagination works by specifying a limit
(elements per page) and to
(first element of the next page). I had to find a way to get the ID of that next element.
At the end of the JSON response from /_/api/users/<user_id>/following
, I saw an interesting key.
...
"paging":{"path":"/_/api/users/d540942266d0/followers","next":{"limit":8,"to":"49260b62a26c"}}},"v":3,"b":"31039-15ed0e5"}
From here, writing a loop to get all the usernames from my Followings list was easy.
def get_list_of_followings(user_id):
print('Retrieving users from Followings...')
next_id = False
followings = []
while True:
if next_id:
# If this is not the first page of the followings list
url = MEDIUM + '/_/api/users/' + user_id
+ '/following?limit=8&to=' + next_id
else:
# If this is the first page of the followings list
url = MEDIUM + '/_/api/users/' + user_id + '/following'
response = requests.get(url)
response_dict = clean_json_response(response)
payload = response_dict['payload']
for user in payload['value']:
followings.append(user['username'])
try:
# If the "to" key is missing, we've reached the end
# of the list and an exception is thrown
next_id = payload['paging']['next']['to']
except:
break
return followings
Once I had the list of users I follow, I wanted to get their latest posts. I could do that with a request to [https://medium.com/@<username>/latest?format=json](https://medium.com/@username/latest?format=json)
I wrote a function that takes a list of usernames and returns a list of post IDs for the latest posts from all the usernames on the input list.
def get_list_of_latest_posts_ids(usernames):
print('Retrieving the latest posts...')
post_ids = []
for username in usernames:
url = MEDIUM + '/@' + username + '/latest?format=json'
response = requests.get(url)
response_dict = clean_json_response(response)
try:
posts = response_dict['payload']['references']['Post']
except:
posts = []
if posts:
for key in posts.keys():
post_ids.append(posts[key]['id'])
return post_ids
With the list of posts, I extracted all the responses using https://medium.com/_/api/posts/<post_id>/responses
This function takes a list of post IDs and returns a list of responses.
def get_post_responses(posts):
print('Retrieving the post responses...')
responses = []
for post in posts:
url = MEDIUM + '/_/api/posts/' + post + '/responses'
response = requests.get(url)
response_dict = clean_json_response(response)
responses += response_dict['payload']['value']
return responses
At first, I wanted responses that had gotten a minimum number of claps. But I realized that this might not be a good representation of the community’s appreciation of the response: a user can give more than one clap for the same article.
Instead, I filtered by the number of recommendations. It measures the same thing as claps, but it doesn’t take duplicates into account.
I wanted the minimum to be dynamic, so I passed a variable named recommend_min
around.
The following function takes a response and the recommend_min
variable. It checks if the response meets that minimum.
def check_if_high_recommends(response, recommend_min):
if response['virtuals']['recommends'] >= recommend_min:
return True
I also wanted recent responses. I filtered out responses that were older than 30 days using this function.
from datetime import datetime, timedelta
def check_if_recent(response):
limit_date = datetime.now() - timedelta(days=30)
creation_epoch_time = response['createdAt'] / 1000
creation_date = datetime.fromtimestamp(creation_epoch_time)
if creation_date >= limit_date:
return True
Once I had all the filtered responses, I grabbed all the authors’ user IDs using the following function.
def get_user_ids_from_responses(responses, recommend_min):
print('Retrieving user IDs from the responses...')
user_ids = []
for response in responses:
recent = check_if_recent(response)
high = check_if_high_recommends(response, recommend_min)
if recent and high:
user_ids.append(response['creatorId'])
return user_ids
After I finished all the functions, I created a pipeline to get my list of recommended users.
def get_interesting_users(username, recommend_min):
print('Looking for interesting users for %s...' % username)
user_id = get_user_id(username)
usernames = get_list_of_followings(user_id)
posts = get_list_of_latest_posts_ids(usernames)
responses = get_post_responses(posts)
users = get_user_ids_from_responses(responses, recommend_min)
return get_usernames(users)
The script was finally ready! To run it, you have to call the pipeline.
interesting_users = get_interesting_users('Radu_Raicea', 10)
print(interesting_users)
Finally, I added an option to append the results to a CSV with a timestamp.
import csv
def list_to_csv(interesting_users_list):
with open('recommended_users.csv', 'a') as file:
writer = csv.writer(file)
now = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
interesting_users_list.insert(0, now)
writer.writerow(interesting_users_list)
interesting_users = get_interesting_users('Radu_Raicea', 10)
list_to_csv(interesting_users)
The project’s source code is on GitHub.
If you don’t know Python, go read TK’s Learning Python: From Zero to Hero.
If you have suggestions on other criteria that make users interesting, please write them below!
Learning Python: From Zero to Hero
☞ https://goo.gl/S3sg63
Python 3 - Programming for Beginners (2017)
☞ https://goo.gl/ysbGLD
Spark and Python for Big Data with PySpark
☞ https://goo.gl/E88g8s
☞ Python Tutorials for Beginners - Learn Python Online
☞ Complete Python Tutorial for Beginners (2019)
☞ Python Programming Tutorial | Full Python Course for Beginners 2019
☞ Learn Python in 12 Hours | Python Tutorial For Beginners