Using get_text()

Getting just text from websites is a common task. Beautiful Soup provides the method get_text() for this purpose.

If we want to get only the text of a Beautiful Soup or a Tag object, we can use the get_text() method. For example:

html_markup = """<p class="ecopyramid">
<ul id="producers">
  <li class="producerlist">
    <div class="name">plants</div>
    <div class="number">100000</div>
  </li>
  <li class="producerlist">
    <div class="name">algae</div>
    <div class="number">100000</div>
  </li>
</ul>"""
soup = BeautifulSoup(html_markup,"lxml")
print(soup.get_text())

#output
plants
100000

algae
100000

The get_text() method returns the text inside the Beautiful Soup or Tag object as a single Unicode string. But get_text() has issues when dealing with ...

Get Getting Started with Beautiful Soup now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Getting Started with Beautiful Soup by Vineeth G. Nair

Using get_text()

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly