Queryly Client Scraper v1.0.0 - Ingeniería de software con propósito

We are excited to announce the release of Queryly Client Scraper v1.0.0! This powerful tool is designed to streamline the process of scraping and collecting news articles from various sources using Queryly’s API. Whether you’re building a news aggregator, conducting media research, or simply need a reliable method for bulk article extraction, Queryly Client Scraper offers the flexibility and features you need to get the job done.

Key Features of Queryly Client Scraper

1. Automated News Collection

Queryly Client Scraper enables you to fetch news articles in bulk from multiple journalistic sources. It uses predefined queries to gather content efficiently, so you can focus on analysis rather than data collection.

2. Customizable Query Parameters

The scraper supports a wide array of query parameters such as:

Portal IDs: Target specific news portals.
Search by Date: Sort and filter articles by publication date.
Section-specific Fetching: Retrieve articles from designated sections (e.g., Technology, Sports, Business).
Batch Size Control: Adjust the number of articles fetched in one batch to suit your needs.

3. Easy to Install and Deploy

The tool is simple to set up and can be integrated into your existing infrastructure. With a few commands, you can start fetching news and incorporating it into your workflows.

4. Efficient Crawling with Error Handling

Built with error handling in mind, Queryly Client Scraper minimizes the risk of disruptions by gracefully managing timeouts, API limits, and unexpected responses from the server.

5. Installer and Deployment

To make it even easier, the tool includes a built-in installer script that automates the setup process. Additionally, it supports remote deployment, making it perfect for cloud or server-based environments.

Installation

Setting up Queryly Client Scraper is simple. Here’s how you can get started:

Clone the Repository Start by cloning the repository from GitHub:

   git clone https://github.com/DanyelMorales/queryly_client_scrapper.git

Build the Tool Use the provided Makefile to build the scraper:

   make build

Run the Tool Once built, you can install and run the scraper using the following command:

   sudo ./bin/setup.sh

Fetch News After installation, you can fetch news articles using specific portal IDs:

   querylyctl --cfg "./cfg.json" articles fetch --id 123 --end-index 5

This command will fetch 5 recent articles sorted by date from portal ID 123.

As well it’s possible to broadcast the cmd flags to all the available queryly sites in the config registry:

   querylyctl --cfg "./cfg.json" articles fetch --id all --end-index 5

Usage Example

If you want to collect all technology-related news from a specific portal, the following command will fetch the articles and store them in a new subdirectory for further processing:

querylyctl --cfg "./cfg.json" articles fetch --id 456 --section "Technology" --out "./news/tech"

The scraper will collect all articles related to technology from portal ID 456 and save them in the ./news/tech directory.

Contributions and Future Improvements

We are always looking for ways to improve the tool, and contributions are more than welcome! If you have ideas, suggestions, or bug reports, feel free to open an issue or submit a pull request on our GitHub repository.

You can also stay tuned for future releases that will include additional features such as:

Enhanced error reporting.
Support for more advanced query options.
Integration with popular data storage solutions for large-scale data collection.

License

Queryly Client Scraper is licensed under the MIT License, making it free and open for all to use, modify, and distribute.

To learn more, visit the project’s GitHub page and start scraping with Queryly Client Scraper v1.0.0 today

Visit the GitHub Repository