Supercharge Your Articles: Crafting Powerful Titles for Data Science Articles on Medium

Use a DistilBERT transformer pre-trained by 140,000 recent article titles

Dmytro Iakubovskyi

May 17, 2023

As a follow-up to my previous article,

How to use Machine Learning to write engaging titles for Medium articles? (substack.com)

appears a natural question of how to efficiently use it to refine your article title to make it more engaging.

For this exercise, I have prepared a short example code publicly available on Kaggle.

I start by importing relevant libraries, and adjusting pandas display options:

import pandas as pd
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
pd.set_option('display.width', 1000)
pd.set_option('display.max_colwidth', 100)

Then, I download the latest version of the pre-trained DistilBERT model and the associated tokeniser to predict the article engagement score for Data Science articles on Medium based on the article title:

tokenizer = AutoTokenizer.from_pretrained("dima806/medium-article-titles-engagement")

model = AutoModelForSequenceClassification.from_pretrained("dima806/medium-article-titles-engagement")

After that, I take 10 sample article titles:

# take 10 sample generated titles
article_titles = [
    "Exploring Machine Learning Algorithms: A Comprehensive Guide",
    "Understanding Deep Neural Networks: From Theory to Practice",
    "Data Visualization Techniques for Effective Data Exploration",
    "Introduction to Natural Language Processing with Python",
    "Optimization Methods in Machine Learning",
    "Introduction to Clustering Algorithms",
    "Understanding Cross-Validation Techniques",
    "Time Series Forecasting with Recurrent Neural Networks",
    "Machine Learning for Healthcare: Challenges and Opportunities",
    "Exploring Reinforcement Learning Algorithms",
]

write an inference function for a list of text to obtain the predicted engagement probability scores:

def get_sorted_results(article_titles):
    # Tokenize the input texts
    encoded_inputs = tokenizer(article_titles, padding=True, return_tensors="pt")

    # Perform inference on the tokenized inputs
    with torch.no_grad():
        logits = model(**encoded_inputs).logits

    # Get the predicted class probabilities
    probs = torch.softmax(logits, dim=1).squeeze()
    last_class_probs = probs[:, -1].tolist()

    # Create a DataFrame to store the results
    results_df = pd.DataFrame({
        "Title": article_titles,
        "Engaged probability": last_class_probs,
    })

    # Sort the resulting DataFrame based on predicted probability
    return results_df.sort_values("Engaged probability", ascending=False)

and output the sorted results:

get_sorted_results(article_titles)

It appears that the most engaging title among the selection is

Time Series Forecasting with Recurrent Neural Networks

and it gets about 99,2% predicted engagement score.

After repeating this procedure, the best among the next iteration titles,

Uncover Hidden Trends: Time Series Forecasting using Recurrent Neural Networks

gets about 99,8% predicted engagement score.

I hope these results can be useful for you. In case of questions/comments, do not hesitate to write in the comments below or reach me directly through LinkedIn or Twitter.

Dmytro’s Newsletter

Supercharge Your Articles: Crafting Powerful Titles for Data Science Articles on Medium

Use a DistilBERT transformer pre-trained by 140,000 recent article titles

How to use Machine Learning to write engaging titles for Medium articles? (substack.com)