Apache
-
Detecting Patterns in Event Streams With FlinkCEP
We call this an event when a button is pressed; a sensor detects a temperature change, or a transaction flows through. An event is an action or state change that is important to an application. Event stream processing (ESP) refers to a method or technique to stream the data in real-time as it passes through a system. The main objective of ESP is to focus on the key goal of taking action on the data as it arrives. This enables real-time…
-
Best Practices for Scaling Kafka-Based Workloads
Apache Kafka is known for its ability to process a huge quantity of events in real time. However, to handle millions of events, we need to follow certain best practices while implementing both Kafka producer services and consumer services. Before start using Kafka in your projects, let’s understand when to use Kafka: High-volume event streams. When your application/service generates a continuous stream of events like user activity events, website click events, sensor data events, logging events, or stock market updates, Kafka’s ability to…
-
Top 5 Key Features of Apache Iceberg for Modern Data Lakes
Big data has significantly evolved since its inception in the late 2000s. Many organizations quickly adapted to the trend and built their big data platforms using open-source tools like Apache Hadoop. Later, these companies started facing trouble managing the rapidly evolving data processing needs. They have faced challenges handling schema level changes, partition scheme evolution, and going back in time to look at the data. I faced similar challenges while designing large-scale distributed systems back in the 2010s for a…
-
Protecting Your Data Pipeline: Avoid Apache Kafka Outages With Topic and Configuration Backups
An Apache Kafka outage occurs when a Kafka cluster or some of its components fail, resulting in interruption or degradation of service. Kafka is designed to handle high-throughput, fault-tolerant data streaming and messaging, but it can fail for a variety of reasons, including infrastructure failures, misconfigurations, and operational issues. Why Kafka Outage Occurs Broker Failure Excessive data load or oversized hardware causes a broker to become unresponsive, hardware failure due to hard drive crash, memory exhaustion, or broker network issues.…
-
Deployment Strategies for Apache Kafka Cluster Types
Organizations start their data streaming adoption with a single Apache Kafka cluster to deploy the first use cases. The need for group-wide data governance and security but different SLAs, latency, and infrastructure requirements introduce new Kafka clusters. Multiple Kafka clusters are the norm, not an exception. Use cases include hybrid integration, aggregation, migration, and disaster recovery. This blog post explores real-world success stories and cluster strategies for different Kafka deployments across industries. Apache Kafka: The De Facto Standard for Event-Driven Architectures…
-
Apache Iceberg: The Open Table Format for Lakehouses and Data Streaming
Every data-driven organization has operational and analytical workloads. A best-of-breed approach emerges with various data platforms, including data streaming, data lake, data warehouse and lakehouse solutions, and cloud services. An open table format framework like Apache Iceberg is essential in the enterprise architecture to ensure reliable data management and sharing, seamless schema evolution, efficient handling of large-scale datasets, and cost-efficient storage while providing strong support for ACID transactions and time travel queries. This article explores market trends; adoption of table format…
-
How to Create an Azure VM Apache WebServer
Setting up a public-facing web server in Azure using a virtual machine offers flexibility and control over your web hosting environment. This tutorial walks you through creating an Azure VM with Apache installed, explaining not just how but why each step is essential in configuring a static web content server accessible from the internet. Prerequisites Before we begin, ensure you have: An Azure account – This gives you access to Azure’s cloud services. Basic familiarity with the Azure Portal –…
-
Forward Apache Logs to OpenSearch via Logstash
Introduction Effective web server log management is crucial for maintaining your website’s performance, troubleshooting issues, and gaining insights into user behavior. Apache is one of the most popular web servers. It generates access and error logs that contain valuable information. To efficiently manage and analyze these logs, you can use Logstash to process and forward them to DigitalOcean’s Managed OpenSearch for indexing and visualization. In this tutorial, we will guide you through installing Logstash on a Droplet, configuring it to…
-
How To Structure a Terraform Project
Introduction Structuring Terraform projects appropriately according to their use cases and perceived complexity is essential to ensure their maintainability and extensibility in day-to-day operations. A systematic approach to properly organizing code files is necessary to ensure that the project remains scalable during deployment and usable to you and your team. In this tutorial, you’ll learn about structuring Terraform projects according to their general purpose and complexity. Then, you’ll create a project with a simple structure using the more common features…