Supercharge Your Articles: Crafting Powerful Titles for Data Science Articles on Medium
Use a DistilBERT transformer pre-trained by 140,000 recent article titles

As a follow-up to my previous article,
How to use Machine Learning to write engaging titles for Medium articles? (substack.com)
appears a natural question of how to efficiently use it to refine your article title to make it more engaging.
For this exercise, I have prepared a short example code publicly available on Kaggle.
I start by importing relevant libraries, and adjusting pandas display options:
import pandas as pd
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
pd.set_option('display.width', 1000)
pd.set_option('display.max_colwidth', 100)
Then, I download the latest version of the pre-trained DistilBERT model and the associated tokeniser to predict the article engagement score for Data Science articles on Medium based on the article title:
tokenizer = AutoTokenizer.from_pretrained("dima806/medium-article-titles-engagement")
model = AutoModelForSequenceClassification.from_pretrained("dima806/medium-article-titles-engagement")
After that, I take 10 sample article titles:
# take 10 sample generated titles
article_titles = [
"Exploring Machine Learning Algorithms: A Comprehensive Guide",
"Understanding Deep Neural Networks: From Theory to Practice",
"Data Visualization Techniques for Effective Data Exploration",
"Introduction to Natural Language Processing with Python",
"Optimization Methods in Machine Learning",
"Introduction to Clustering Algorithms",
"Understanding Cross-Validation Techniques",
"Time Series Forecasting with Recurrent Neural Networks",
"Machine Learning for Healthcare: Challenges and Opportunities",
"Exploring Reinforcement Learning Algorithms",
]
write an inference function for a list of text to obtain the predicted engagement probability scores:
def get_sorted_results(article_titles):
# Tokenize the input texts
encoded_inputs = tokenizer(article_titles, padding=True, return_tensors="pt")
# Perform inference on the tokenized inputs
with torch.no_grad():
logits = model(**encoded_inputs).logits
# Get the predicted class probabilities
probs = torch.softmax(logits, dim=1).squeeze()
last_class_probs = probs[:, -1].tolist()
# Create a DataFrame to store the results
results_df = pd.DataFrame({
"Title": article_titles,
"Engaged probability": last_class_probs,
})
# Sort the resulting DataFrame based on predicted probability
return results_df.sort_values("Engaged probability", ascending=False)
and output the sorted results:
get_sorted_results(article_titles)
It appears that the most engaging title among the selection is
Time Series Forecasting with Recurrent Neural Networks
and it gets about 99,2% predicted engagement score.
After repeating this procedure, the best among the next iteration titles,
Uncover Hidden Trends: Time Series Forecasting using Recurrent Neural Networks
gets about 99,8% predicted engagement score.
I hope these results can be useful for you. In case of questions/comments, do not hesitate to write in the comments below or reach me directly through LinkedIn or Twitter.