Make sure that required .Py Libraries are installed on host – [ Requirements.txt ]
pip install requests
pip install bs4
First, import the two packages (BeautifulSoup and Requests) :
from bs4 import BeautifulSoup
import requests
Second, ask the user for an input URL to Scrape the data :
url = input('Enter a website to extract the links from: ')
Third, requests data from the server using the GET protocol :
re_data = requests.get(url)
Addition – Create an if/else Statement to make sure “https” protocol is included in the URL that was inputted :
if ('https' or 'http') in url:
re_data = requests.get(url)
else:
re_data = requests.get('https://' + url)
Fourth, use python “html.parser”, to pull data out of HTML File :
soup_parser = BeautifulSoup(re_data.text, 'html.parser')
Fifth, create an empty list to store the links in [“Define an empty array”] :
links = []
Sixth, get all (find_all) the links from ” <a> ” tags with the attribute ” href ” , and add (append) it into the links variable created above :
for links in soup_parser.find_all('a'):
links.append(link.get('href'))
Seventh, print and save the “links” to a file, and add ” sep = ‘\n’ ” in the print function to neatly display the output file :
with open('Kalistamp.txt', 'a') as done:
print(*links, sep = '\n', file = done)