When OpenAI unveiled their third-generation machine learning (ML) model focused on text generation in July 2020, it was clear something significant had changed. This model resonated with a wider audience than any prior version. I began to hear friends and colleagues, who typically show little interest in technological advancements, discussing it. Even mainstream outlets took notice, such as a notable article in the Guardian that was actually penned by the model itself, edited, and published by the publication. There was no denying it—GPT-3 had transformed the landscape.
Following its release, various potential applications emerged rapidly. Within weeks, numerous impressive demonstrations surfaced, accessible on the GPT-3 website. One application that particularly intrigued me was text summarization—the ability for a computer to read and condense a given text’s content. This task is among the most challenging for computers, as it melds reading comprehension with text generation. This is why I found the GPT-3 text summarization demos so remarkable.
You can explore these demos on the Hugging Face Spaces website. Currently, my favorite allows users to generate summaries of news articles simply by entering the URL of the article.
In this two-part series, I aim to provide a practical guide for organizations to evaluate the effectiveness of text summarization models relevant to their fields.
Overview of the Tutorial
Many organizations I collaborate with—charities, businesses, NGOs—possess vast amounts of text that require summarization, including financial reports, news articles, scientific papers, patent applications, legal contracts, and more. These entities are keen on automating these processes through NLP technology. To showcase what’s possible, I often demonstrate text summarization, which rarely fails to impress.
But what’s next?
Organizations face the challenge of wanting to evaluate text summarization models across numerous documents—not just one at a time. They don’t want to hire interns to manually paste documents into an application, click the Summarize button, wait for results, and assess the quality of summaries repeatedly for thousands of texts.
I crafted this tutorial with my previous self in mind—it’s the guide I wished I had when I embarked on this journey. Thus, the intended audience is individuals familiar with AI/ML who have previously used Transformer models but are just beginning their exploration of text summarization. Written by a “beginner” for beginners, this tutorial is designed as a practical guide—not the definitive one. Keep in mind George E.P. Box’s famous quote: “All models are wrong, but some are useful.”
In terms of technical knowledge, this tutorial involves some Python coding, primarily to call APIs; thus, deep programming expertise isn’t essential. Familiarity with ML concepts—such as training and deploying models, along with training, validation, and test datasets—will be beneficial. Prior experience with the Transformers library will also aid understanding, as we will utilize it extensively throughout this guide. For further reading on these concepts, I’ve included useful links.
Since this tutorial is geared towards novices, I don’t anticipate NLP experts or advanced deep learning professionals will find much of it technically challenging. However, you might still find it enjoyable, so please bear with my simplifications—I strive to keep everything as straightforward as possible without oversimplifying.
Structure of the Tutorial
This series is divided into four sections over two posts, each addressing different phases of a text summarization project. In the first post (section 1), we introduce a metric for evaluating summarization tasks—a performance measure that helps determine the quality of a summary. We’ll also present the dataset we intend to summarize and establish a baseline using a non-ML model—employing a simple heuristic to generate a summary from given text. Establishing this baseline is crucial in any ML project, as it quantifies the progress made by leveraging AI technology. It poses the question: “Is it truly worthwhile to invest in AI?”
In the second post, we will utilize a pre-trained model to generate summaries (section 2). This approach, known as transfer learning, allows us to employ an off-the-shelf model and test it against our dataset, creating another baseline to compare against when we train a model on our data. This method is referred to as zero-shot summarization, as the model has not previously encountered our dataset.
Next, we will fine-tune a pre-trained model on our own dataset (section 3), enabling it to learn from the specific patterns and nuances of our data. Once we have trained the model, we will evaluate its performance in generating summaries (section 4).
To summarize:
- Part 1:
- Section 1: Establish a baseline using a no-ML model
- Part 2:
- Section 2: Generate summaries with a zero-shot model
- Section 3: Train a summarization model
- Section 4: Evaluate the trained model
The complete code for this tutorial can be found in the accompanying GitHub repository.
What Will We Achieve by the End of This Tutorial?
By the end of this tutorial, we won’t have a production-ready text summarization model—nor will we have a particularly effective one! Instead, we will have laid the groundwork for the next phase of the project: experimentation. This stage embodies the “science” in data science, focusing on testing various models and configurations to ascertain whether a sufficiently effective summarization model can be developed with the available training data.
To be candid, there’s a significant chance the conclusion may reveal that the technology is not yet mature enough for implementation, a reality that business stakeholders must be prepared for. However, that topic is reserved for a future discussion.
Section 1: Use a no-ML Model to Establish a Baseline
In this first section of our tutorial, we will set a baseline using a straightforward model without employing ML techniques. This step is vital in any ML project, as it helps us understand how much value ML adds throughout the project and whether it’s a wise investment.
You can find the tutorial code in the associated GitHub repository.
Data, Data, Data
Every ML initiative begins with data! Ideally, we should use data that aligns with our goals for the text summarization project. For instance, if our aim is to summarize patent applications, we should utilize patent applications for model training. A key consideration in ML projects is that training data typically needs to be labeled. In the context of text summarization, this means we must provide the necessary data to train the model effectively.
For further insights, check out this blog post for additional context. Additionally, if you’re looking for authoritative resources, CHVNCI offers valuable information on this subject. Lastly, for a community perspective, this Reddit thread is an excellent resource.
Leave a Reply