Web Scraper using Python for Coronavirus headlines from NYT

Purpose: Receive Coronavirus headline news from New York Times push notification using Pushover

Prerequisite: Pushover

from bs4 import BeautifulSoup 
from urllib.request import urlopen
from datetime import datetime
import http.client, urllib

hour = datetime.now().hour
if hour < 12 and hour > 3:
    title = "Morning Briefing"
elif hour >= 12 and hour <18:
    title = "Afternoon Briefing"
else:
    title = "Evening Briefing"
page = urlopen("https://www.nytimes.com/news-event/coronavirus")
bs = BeautifulSoup(page.read(), "html.parser")
timeStamp = bs.find("span", class_="css-1stvlmo").get_text()
newsItems = bs.find_all("li", class_="css-1g3a8xd")
headline = bs.find("p", class_="css-15hwz5e evys1bk0")
empty = []
msg = ""
try:
    for news in newsItems:
        if news != empty and news !="" and news != None:
            link = news.find('a', href = True)
            html_text = "<a href="\"{}\"">{}</a>".format(link["href"], news.get_text())
            msg = msg + html_text + " "
    lastUpdated = " [Updated on {}]".format(timeStamp)
    msg = msg + lastUpdated
except:
    msg = None
if msg != None:
    try:
        conn = http.client.HTTPSConnection('api.pushover.net:443')
        conn.request('POST', '/1/messages.json',
        urllib.parse.urlencode({
        'token':'YOUR OWN TOKEN',
        'user':'YOUR OWN USER',
        'html': 1,
        'title': title,
        'message':msg,
        }),{'Content-type':'application/x-www-form-urlencoded'})
        conn.getresponse()
    except:
        print("Push failed!")

By using Task Scheduler in Windows, or crontab in Linux, you can automate this script to send you the updated headlines in certain time interval.

For Windows user, a batch file is needed to execute the python script (see below).

taskkill /F /IM chromedriver.exe
“C:\Users\deepe\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\python.exe” “D:\Code\covid_19_news.py”
pause

Leave a Reply

Your email address will not be published. Required fields are marked *