(My) Overview: NewsAPI

Victor Fernandez
7 min readMay 2, 2022

--

Photo by Obi Onyeador on Unsplash

The Problem:

I have a project where I create a personal news aggregator that will provide the latest or most important three articles about a specific topic.

I need a source of information, either a news API or a crawler targeting news websites.

The Solution:

Create a script that use a news API or a series of “script” to crawl relevant news sites searching for important or news articles on specific topics.

The Implementation:

⚠️ I will discard (for now) the usage of spiders (crawlers) since this required some extra “admin” work (checking if it is legal to crawl the target website).

I will use NewsAPI as a source for the news and google sheets and a telegram bot as a way to display the results given by News API.

The filter parameters will include:

  • The source.
  • The day of publication.
  • A maximum of three articles per topic.

❓What is NewsAPI?

The News API is a Rest API that provides a JSON formated results format from more than 800.000 — NewsAPI official website.

Get the API Key

1. Go to the get started page.

2. Click Get API Key.

3. Fill out the form. API Key obtain!.

Be aware there is a limitation to the free API. For most applications the free tier will be enough.

The documentation provides a list of client libraries in different languages. The Python client mattlisiv/newsapi-python is not an official client, but it is simple to use, so this is what I am going to use.

News API Description.

The API is sub-divided into two* endpoints.

  • /v2/everything: It gathers all information about a specific topic
  • /v2/top-headlines: Gets the top-headlines based on country and language.
  • /v2/top-headlines/source This is a specialized endpoint. It returns information (including name, description, and category) about the sources used to provide the headlines.

Authentication

They are three different ways to authenticate with the API:

  1. As part of the query string, apiKey="Here the API key”.
  2. Via X-Api-Key HTTP header.
  3. Via the Authorization HTTP header. Including Bearer is optional.
#Via query string 
GET https://newsapi.org/v2/everything?q=keyword&apiKey=db0c830faab34094b9dyyyxxxxxxxx
#Via X-Api-Key HTTP header
X-Api-Key: db0c830faab34094b9dyyyyyxxxxxxxx
#Via Authorization HTTP header
Authorization: db0c830faab34094b9dyyyyyxxxxxxxx

This is a personal project, so I choose the one i feel the most comfortable with, the header parameter X-Api-Key.

If the authentication wrong or missing. The 401 Unauthorized HTTP error will be returned.

Endpoints

/v2/everything

This Endpoint is a good option for general-purpose or discovery and analysis. Which is what i need (I will explain why not top headlines later)

For more information, check the official documentation

Request parameters

  • apiKey can be passed as part of the string query or as another form previously discussed.
  • qand qInTitle The first parameter is used to provide the phrases or keywords to search. The second qInTitlefocus on keywords and phrases present just in the title of the new.
  • source with this parameter, we can limit the sources where the articles are obtained.
  • from and to It is self-explanatory it will limit the time frame for the news.
GET https://newsapi.org/v2/everything?q=apple&from=2021-10-02&to=2021-10-02&sortBy=popularity&apiKey=db0c830faab340yyyyyyxxxxxxxxxxxx

Response Object

The response Object will be in JSON format, below an illustration.

{
“status”: “ok”,
“totalResults”: 2177,
“articles”: [
{results_1},
{results_2},
]

From the code above:

  • status is just an indicator if the response is successful.
  • totalResult the number of results.
  • articles is an array of JSON objects that contain the news object response.

The parameters within thearticles array.

/v2/top-headlines And /v2/top-headlines/sources

Thes endpoints provide breaking news or headlines for a country and the sources of those headlines.

During the implementation, I ran into a simple issue. I am in Asia and even setting the country parameter to CO or US ( Spanish or English), I still got some headlines in Asian languages or headlines from a few weeks ago. To keep it simple, I decided to use the everything endpoint.

More information:

/v2/top-headlines

/v2/top-headlines/sources

🔥Errors

The response will include:

  • status a simple string error.
  • code the HTTP code.
  • message description of the error.
{
“status”: “error”,
“code”: “apiKeyMissing”,
“message”: “Your API key is missing. Append this to the URL with the apiKey param, or use the x-api-key HTTP header.”
}

HTTP status

  • 200 — OK success.
  • 400 — Bad request Unacceptable, most likely a missing parameter or an error in one.
  • 401 — Unauthorized Your API key is not correct.
  • 429 — To Many Request too many requests in a short time window.
  • 500 — server error something is wrong with the newsAPI.

Error codes

For a completed list check the documentation.

The Most relevant error or those I might use in a try/except block will be:

  • apiKEyDisabled the key is disabled.
  • apiKeyExhausted we reach the limit of the plan.
  • parameterInvalid the request has some invalid parameters.
  • parametersMissing the request is missing some parameters.

Client Library

As mentioned in the documentation, there is an unofficial python library

Installing

pip install newsapi-python

Example Code

Some recommendations:

  1. For get_top_headlines it is not possible to make a request using source and category/ country at the same time, it will return an error.
  2. For get_everything pay attention to the time frame, on the free tier, the API will limit the search to one month back or one month-old news.

👉(My) Implementation

First, I need to decide what type of information. I will extract it from the response object.

I don't need all the information in the response object. The idea is to have a snippet of the news, if I feel interested, I will use the url to navigate to the source.

  1. I need the content of articles
  2. I don't need content. It is truncated
  3. I will focus on sources>name, author, title, description, publishedAtand url.

Next steps

  • I want to create a Telegram bot where i will input a list of topics or a single topic, and I will get back a message with the parameter i alredy mentioned, especially the descritpion and the url to the original article.
  • I will set the raspberry to run a webserver that will take care to run the script and get the telegram bot request.

Final Thoughts

  • I hardcoded the time frame to fourdays in the past, I don’t think i need information older than four days but still, it is a hardcode parameter.
  • I am using the query parameter q will look for the topic key-word on the article body, so it is possible the article will not have relevant information.
  • I filter the articles to get the three most relevant, but i don’t check if those three articles are coming from the same source, it might be a good idea to create an extra function to ensure each article comes from a different source.
  • In some cases the topic doesn’t yield any result, i don’t have any function handling this scenario.

Victor Fernandez

--

--

Victor Fernandez
Victor Fernandez

I’m Victor, I’m a Field Application Engineer for a CCTV manufacturer. I love Raspberry Pi, Python, and Microcontrollers and I write about my personal projects.