Apache Kafka with Node.js
Apache Kafka it’s a real-time data streaming technology created and open-sourced in 2011.
Kafka is a broker-based solution that operates by maintaining streams of data as records within a cluster of servers and it can handle billions of event streams per minute.
Before the arrival of event streaming systems like Apache Kafka and other Pub/Sub architecture, data processing was typically handled with periodic batch jobs that couldn’t be handled in real-time.
Instead, Apache Kafka has streams of events: they are created to capture the time value of data and create push-based applications that kick in whenever an interesting event occurs in real time.
Being purpose-built for real-time log streaming, Apache Kafka is ideal for applications that need to:
- Native support for data/message playback
- Ability to partition messaging workloads as application requirements change
- Real-time streaming for data processing
- Reliable data exchanges between different components
Nowadays there are a lot of applications that can take a lot of advantages using Apache Kafka, and now we can understand the basic concepts of this technology.
Broker: It’s a single Kafka server that runs on the Java Virtual Machine.
Cluster: It’s a group of brokers that work together, it can contain one or more brokers. Every broker in a cluster has a unique ID.
Topics: They are the categories used to organize messages and each of them has a unique name across the entire Kafka cluster.
Topics are partitioned and replicated across brokers, using the partitions approach, which enables the parallelization of topics, enabling high message throughput.
Each partition has a specific offset and they are sliced into different segments
Every topic can have zero, one, or multiple consumers subscribing to that topic.
(image from: https://www.javatpoint.com/kafka-topics)
Producer: It’s an application that sends messages to a topic, those messages are distributed to partitions according to a mechanism such as key hashing.
A Kafka message consists of different elements:
- Key: It is optional in the Kafka message and it can be null. A key may be a string, number, or any object and then the key is serialized into binary format.
- Value: represents the content of the message and can also be null. The value format is arbitrary and is then also serialized into binary format.
- Compression Type: The compression type can be specified as part of the message. Options are none, gzip, lz4, snappy, and zstd
- Headers: There can be a list of optional Kafka message headers in the form of key-value pairs. It is common to add headers to specify metadata about the message, especially for tracing.
- Partition + Offset: Once a message is sent into a Kafka topic, it receives a partition number and an offset id. The combination of topic+partition+offset uniquely identifies the message
- Timestamp: It is added by the user or the system in the message.
Consumer: It’s an application that reads data from the Kafka cluster via a topic, It reads the data within each partition in a specific order starting from 0 and increasing the offset number.
Zookeeper: It’s a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
It’s used to track cluster state, membership, and leadership
Now, these are the main concepts (there are a lot more) but now it’s time to see a little bit of code, creating topics, sending messages, and reading them.
Create a basic application
The first thing to do is to create a local environment with Docker, obviously that you can use Confluent Cloud which is perfect for most of your use cases. But for this example, I preferred to use a local docker.
docker-compose.yml
version: ‘3’
services:
zookeeper:
image: wurstmeister/zookeeper
container_name: zookeeper
ports:
- "2181:2181"
kafka:
image: wurstmeister/kafka
container_name: kafka
ports:
- "9092:9092"
environment:
KAFKA_ADVERTISED_HOST_NAME: localhost
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
To run the application you can launch this command:
docker-compose up -d
I have used the standard port for each service, now we can try to create our NodeJS app by installing a useful library: kafkajs
npm install kafkajs
After that, you can create a file called topic.js to create your topics
import {Kafka} from ‘kafkajs’
run();
async function run() {
try {
const kafka = new Kafka({
“clientId”: “myapp”,
“brokers”: [“localhost:9092”]
}) const admin = kafka.admin();
await admin.connect(); await admin.createTopics({
“topics”: [{
“topic”: “Users”,
“numPartitions”: 2
}]
})
console.log(“Created succesfully”); admin.disconnect();
} catch(ex) {
console.error(`Something bad happended ${ex}`)
} finally{
process.exit(0);
}
}
In this file, you are connecting to the local Kafka creating a topic "Users" with two partitions, one from the first letter starting with a-m, and the second from n-z.
To create a topic you can launch:
node topic.js
Now we can send a message into the topic created previously, to do it, you can create a file called producer.js like this:
import {Kafka} from ‘kafkajs’
const msg = process.argv[2];
run();
async function run() {
try {
const kafka = new Kafka({
“clientId”: “myapp”,
“brokers”: [“localhost:9092”]
})
const producer = kafka.producer();
await producer.connect();
const partition = msg[0] < “N” ? 0 : 1;
const result = await producer.send({
“topic”: “Users”,
“messages”: [
{
“value”: msg,
“partition”: partition
}
]
})
console.log(`Send succesfully! ${JSON.stringify(result)}`);
producer.disconnect();
} catch(ex) {
console.error(`Something bad happended ${ex}`)
} finally{
process.exit(0);
}
}
In this file, we are connecting again to Kafka and we take from the CLI command the message argument passed. With that argument, we try to understand in which partition we can send the message and after that, we can send it passing the name of the topic and the message with the value and the partition.
You can send a message from your CLI in this way:
node producer.js test-message
Now we can consume the message with a consumer.
You can create a file called consumer.js with the following code:
import {Kafka} from ‘kafkajs’
run();
async function run() {
try {
const kafka = new Kafka({
“clientId”: “myapp”,
“brokers”: [“localhost:9092”]
})
const consumer = kafka.consumer({“groupId”: “test”});
await consumer.connect();
await consumer.subscribe({
“topic”: “Users”,
“fromBeginning”: true
})
await consumer.run({
“eachMessage”: async result => {
console.log(`Received message ${result.message.value} on partition ${result.partition}`)
}
})
} catch(ex) {
console.error(`Something bad happended ${ex}`)
} finally{
}
}
In this file, we are connecting to Kafka and we are reading from the beginning the topic "Users" from each partition.
To read messages you can launch this command:
node consumer.js
Here is an example of a message created and read!
Now you can add another consumer attached to the same topic, if you do it, you are parallelizing the way you are reading messages because a consumer read from one partition and the other one from the other partition.
A consumer can consume from multiple partitions, but a partition can be consumed by only one consumer, this is a concept of the Consumer group!
So, this can be your first application using Apache Kafka locally, it’s very simple and basic.
There are a lot of concepts about Kafka and more and more things to know and understand but with this simple example, you are seeing the basic concept of Kafka!
Pros and Cons
But what are the pros and cons of this architecture?
Pros:
- High performance
- Distributed system
- Event Driven, Kafka can be a Pub Sub system and a queue system
- Scalability
- Parallel Processing
Cons:
- The producer needs to explicit the correct partition to write and it can be a problem
- It’s complex to install configure and manage
There are a lot of advantages and use cases where you can apply Kafka, it's an outstanding architecture with a lot of useful components to solve your problems!
You can find the code of this article example in this repository: https://github.com/AlessandroMinoccheri/apache-kafka-basic