• Follow Us On :
kafka-docker

In-Depth Exploration of Kafka: The Engine Behind Data Streaming

Kafka Overview and Essential Components

Apache Kafka is a leading open-source platform designed for stream processing. It provides a powerful publish-subscribe messaging system tailored to the needs of real-time data pipelines and streaming applications. Kafka's architecture is composed of several integral components that work together to create a cohesive and efficient data streaming environment:

  • Producer: Producers are entities responsible for transmitting data to Kafka topics. These can be various applications or systems that generate valuable data, feeding it into the Kafka ecosystem.
  • Consumer: Consumers retrieve and process data from Kafka topics. They are typically applications or services that need to utilize and act on the streamed data.
  • Broker: Kafka brokers play a crucial role in managing the storage, distribution, and retrieval of data. They act as the intermediary agents between producers and consumers, ensuring smooth data flow.
  • Cluster: A Kafka cluster consists of multiple servers or computers, each running Kafka brokers. Clusters are designed to provide fault tolerance and scalability, ensuring high availability and performance.
  • Topic: A topic in Kafka is a named stream of data, serving as a logical channel for data categorization. Topics facilitate the organized publishing and subscribing of data streams.
  • Partitions: Topics are subdivided into partitions, enabling parallel processing and increased throughput. Partitions are ordered and immutable, ensuring efficient data handling.
  • Offset: An offset is a unique identifier assigned to each message within a partition. It allows consumers to keep track of which messages have been processed, maintaining data consistency and order.
  • Consumer Groups: Consumers can be organized into groups, where each group processes a subset of the data independently. This setup enhances load balancing and provides fault tolerance within the Kafka ecosystem.
  • ZooKeeper: ZooKeeper is an essential component used for managing and coordinating Kafka brokers. It handles configuration management, leader election, and the detection of broker failures, ensuring the stability and reliability of the Kafka infrastructure.

This architecture makes Apache Kafka a powerful and reliable solution for building real-time data pipelines and streaming applications, capable of handling high throughput and delivering data consistently across distributed systems.

Using Docker for Kafka Setup

Introduction to Docker

Docker has transformed application deployment, distribution, and execution by offering a containerization platform that packages applications along with their dependencies into isolated units known as containers. These containers are lightweight, portable, and guarantee consistent behavior across various environments. Docker enables developers to streamline the deployment process, minimize compatibility issues, and optimize resource usage.

Dockerizing Kafka and ZooKeeper

Docker simplifies the setup of Kafka and ZooKeeper by encapsulating these services within containers. This approach removes the need for manual configuration, enhancing both efficiency and consistency in your deployment.

Docker Composer.yml

 

---
version: '2'
services:
zookeeper:
image: confluentinc/cp-zookeeper:7.5.0
hostname: zookeeper
container_name: zookeeper
ports:
- "2181:2181"
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
broker:
image: confluentinc/cp-server:7.5.0
hostname: broker
container_name: broker
depends_on:
- zookeeper
ports:
- "9092:9092"
- "9101:9101"
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: 'zookeeper:2181'
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092
KAFKA_METRIC_REPORTERS: io.confluent.metrics.reporter.ConfluentMetricsReporter
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
KAFKA_CONFLUENT_LICENSE_TOPIC_REPLICATION_FACTOR: 1
KAFKA_CONFLUENT_BALANCER_TOPIC_REPLICATION_FACTOR: 1
KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
KAFKA_JMX_PORT: 9101
KAFKA_JMX_HOSTNAME: localhost
KAFKA_CONFLUENT_SCHEMA_REGISTRY_URL: http://schema-registry:8081
CONFLUENT_METRICS_REPORTER_BOOTSTRAP_SERVERS: broker:29092
CONFLUENT_METRICS_REPORTER_TOPIC_REPLICAS: 1
CONFLUENT_METRICS_ENABLE: 'true'
CONFLUENT_SUPPORT_CUSTOMER_ID: 'anonymous'
schema-registry:
image: confluentinc/cp-schema-registry:7.5.0
hostname: schema-registry
container_name: schema-registry
depends_on:
- broker
ports:
- "8081:8081"
environment:
SCHEMA_REGISTRY_HOST_NAME: schema-registry
SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS: 'broker:29092'
SCHEMA_REGISTRY_LISTENERS: http://0.0.0.0:8081
connect:
image: cnfldemos/cp-server-connect-datagen:0.6.2-7.5.0
hostname: connect
container_name: connect
depends_on:
- broker
- schema-registry
ports:
- "8083:8083"
environment:
CONNECT_BOOTSTRAP_SERVERS: 'broker:29092'
CONNECT_REST_ADVERTISED_HOST_NAME: connect
CONNECT_GROUP_ID: compose-connect-group
CONNECT_CONFIG_STORAGE_TOPIC: docker-connect-configs
CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR: 1
CONNECT_OFFSET_FLUSH_INTERVAL_MS: 10000
CONNECT_OFFSET_STORAGE_TOPIC: docker-connect-offsets
CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR: 1
CONNECT_STATUS_STORAGE_TOPIC: docker-connect-status
CONNECT_STATUS_STORAGE_REPLICATION_FACTOR: 1
CONNECT_KEY_CONVERTER: org.apache.kafka.connect.storage.StringConverter
CONNECT_VALUE_CONVERTER: io.confluent.connect.avro.AvroConverter
CONNECT_VALUE_CONVERTER_SCHEMA_REGISTRY_URL: http://schema-registry:8081
# CLASSPATH required due to CC-2422
CLASSPATH: /usr/share/java/monitoring-interceptors/monitoring-interceptors-7.5.0.jar
CONNECT_PRODUCER_INTERCEPTOR_CLASSES: "io.confluent.monitoring.clients.interceptor.MonitoringProducerInterceptor"
CONNECT_CONSUMER_INTERCEPTOR_CLASSES: "io.confluent.monitoring.clients.interceptor.MonitoringConsumerInterceptor"
CONNECT_PLUGIN_PATH: "/usr/share/java,/usr/share/confluent-hub-components"
CONNECT_LOG4J_LOGGERS: org.apache.zookeeper=ERROR,org.I0Itec.zkclient=ERROR,org.reflections=ERROR
control-center:
image: confluentinc/cp-enterprise-control-center:7.5.0
hostname: control-center
container_name: control-center
depends_on:
- broker
- schema-registry
- connect
- ksqldb-server
ports:
- "9021:9021"
environment:
CONTROL_CENTER_BOOTSTRAP_SERVERS: 'broker:29092'
CONTROL_CENTER_CONNECT_CONNECT-DEFAULT_CLUSTER: 'connect:8083'
CONTROL_CENTER_KSQL_KSQLDB1_URL: "http://ksqldb-server:8088"
CONTROL_CENTER_KSQL_KSQLDB1_ADVERTISED_URL: "http://localhost:8088"
CONTROL_CENTER_SCHEMA_REGISTRY_URL: "http://schema-registry:8081"
CONTROL_CENTER_REPLICATION_FACTOR: 1
CONTROL_CENTER_INTERNAL_TOPICS_PARTITIONS: 1
CONTROL_CENTER_MONITORING_INTERCEPTOR_TOPIC_PARTITIONS: 1
CONFLUENT_METRICS_TOPIC_REPLICATION: 1
PORT: 9021
ksqldb-server:
image: confluentinc/cp-ksqldb-server:7.5.0
hostname: ksqldb-server
container_name: ksqldb-server
depends_on:
- broker
- connect
ports:
- "8088:8088"
environment:
KSQL_CONFIG_DIR: "/etc/ksql"
KSQL_BOOTSTRAP_SERVERS: "broker:29092"
KSQL_HOST_NAME: ksqldb-server
KSQL_LISTENERS: "http://0.0.0.0:8088"
KSQL_CACHE_MAX_BYTES_BUFFERING: 0
KSQL_KSQL_SCHEMA_REGISTRY_URL: "http://schema-registry:8081"
KSQL_PRODUCER_INTERCEPTOR_CLASSES: "io.confluent.monitoring.clients.interceptor.MonitoringProducerInterceptor"
KSQL_CONSUMER_INTERCEPTOR_CLASSES: "io.confluent.monitoring.clients.interceptor.MonitoringConsumerInterceptor"
KSQL_KSQL_CONNECT_URL: "http://connect:8083"
KSQL_KSQL_LOGGING_PROCESSING_TOPIC_REPLICATION_FACTOR: 1
KSQL_KSQL_LOGGING_PROCESSING_TOPIC_AUTO_CREATE: 'true'
KSQL_KSQL_LOGGING_PROCESSING_STREAM_AUTO_CREATE: 'true'
ksqldb-cli:
image: confluentinc/cp-ksqldb-cli:7.5.0
container_name: ksqldb-cli
depends_on:
- broker
- connect
- ksqldb-server
entrypoint: /bin/sh
tty: true
ksql-datagen:
image: confluentinc/ksqldb-examples:7.5.0
hostname: ksql-datagen
container_name: ksql-datagen
depends_on:
- ksqldb-server
- broker
- schema-registry
- connect
command: "bash -c 'echo Waiting for Kafka to be ready... && 
cub kafka-ready -b broker:29092 1 40 && 
echo Waiting for Confluent Schema Registry to be ready... && 
cub sr-ready schema-registry 8081 40 && 
echo Waiting a few seconds for topic creation to finish... && 
sleep 11 && 
tail -f /dev/null'"
environment:
KSQL_CONFIG_DIR: "/etc/ksql"
STREAMS_BOOTSTRAP_SERVERS: broker:29092
STREAMS_SCHEMA_REGISTRY_HOST: schema-registry
STREAMS_SCHEMA_REGISTRY_PORT: 8081
rest-proxy:
image: confluentinc/cp-kafka-rest:7.5.0
depends_on:
- broker
- schema-registry
ports:
- 8082:8082
hostname: rest-proxy
container_name: rest-proxy
environment:
KAFKA_REST_HOST_NAME: rest-proxy
KAFKA_REST_BOOTSTRAP_SERVERS: 'broker:29092'
KAFKA_REST_LISTENERS: "http://0.0.0.0:8082"
KAFKA_REST_SCHEMA_REGISTRY_URL: 'http://schema-registry:8081'
kafdrop:
image: obsidiandynamics/kafdrop:3.30.0
hostname: kafdrop
container_name: kafdrop
ports:
- "9000:9000"
environment:
KAFKA_BROKER_CONNECT: "broker:29092"
azure-sql-edge:
image: mcr.microsoft.com/azure-sql-edge:latest
container_name: azure-sql-edge
environment:
ACCEPT_EULA: Y
MSSQL_SA_PASSWORD: YourStrong@Passw0rd
MSSQL_PID: Developer
ports:
- "1433:1433"

Docker Instructions

To set up Kafka using Docker:

  1. Utilize Docker for quick Kafka setup. Follow the link (Apache Kafka® Quick Start - Local Install With Docker ) for a guide.
  2. Run the docker-compose up -d command to launch Kafka and ZooKeeper services.
  3. Start Docker in the background: docker-compose -f docker-compose.yml up -d.
  4. Stop services: docker-compose down.
  5. Access Kafka within Docker: docker exec -it kafka /bin/sh.

You can see running process like below

Test the docker setup

  1. Docker containers running status:

 

  1. Kafdrop UI portal access:

 

Conclusion:

Congratulations on successfully configuring and executing Apache Kafka on Docker within your local environment. This comprehensive guide has provided you with a robust platform to explore and experiment with Kafka’s features. As you further your understanding, consider incorporating Kafka into your applications to fully utilize its capabilities in constructing real-time data pipelines and sophisticated streaming solutions.

If your testing process is successful, you’re all set! Should you encounter any issues, our team is readily available to provide guidance and support with a prompt response. Please do not hesitate to reach out to us at any time [email protected]

Wishing you continued success in your coding endeavors 🚀.

 

Leave a Reply

Your email address will not be published. Required fields are marked *