Kafka Topic Naming Conventions

Chris Riccomini on August 29, 2017

Today, I’ll be tackling the controversial subject of Kafka topic names. Not only will I review various schemes, but I’ve decided to take a stand on a few issues and outline my reasoning. Get out your brush, and let’s paint this bike shed!

There’s surprisingly little guidance on the internet about Kafka topic naming conventions. A few sparse Stack Overflow questions, and a couple of mailing list discussions are all that pop up on the first page of Google. The opinions on the matter vary pretty widely. Some suggestions from the links above include:

Convention types

There are two types of naming conventions: structural and semantic. A good topic naming convention should define both structural and semantic guidelines.

Structural conventions define things like what kind of punctuation to use, or how to format spaces. The most basic structural convention is actually what Kafka, itself, enforces:

Valid characters for Kafka topics are the ASCII alphanumerics, ‘.’, ‘_’, and ‘-’

You can’t escape this. You can further refine it, though, by saying that dashes are used as spaces, or all topics must be camelCase, for example.

Semantic conventions define what fields should go into a topic name, and in what order.

Potential fields

It turns out that people get pretty creative in the conventions that they come up with. Here are a number of fields that I’ve seen used or proposed in naming conventions:

Don’t do this

I’m going to be provocative and make some concrete recommendations.

Don’t use fields that change

My biggest advice is to avoid fields that can change over time. This includes things like team name, topic owner, service name, product name, and consumer name.

The rationale for avoiding dynamic fields is that it’s impossible to rename a topic, and can be painstaking to migrate data to a new topic. When a service is deprecated and removed, for example, the service’s name is still in the Kafka topic.

Don’t use fields if data is available elsewhere

The logic here is that if you can get the information from another source, it’s best to do so; especially of that other source is actually the source of truth. Two common sources are:

The schema registry can provide you with the information about a schema for a given topic. This is true for both keys and values in the topic. It is also the source of truth for this information.

Kafka brokers provide topic metadata information that include partition count, replication number, security information, etc. Again, Kafka is the source of truth for this information.

Don’t tie topic names to consumers or producers

It’s quite likely that a topic is going to have more than one consumer, and it’s also possible that whoever is sending messages to a topic will change over time. It doesn’t make sense to include either of these in a topic’s name because it violates the dynamic field rule, above.

Do this

So, what should you do? I’ve had success with a basic and flexible convention:

<message type>.<dataset name>.<data name>

Here, valid message type values are left up to the organization to define. Typical types include:

The dataset name is analogous to a database name in traditional RDBMS systems. It’s used as a category to group topics together.

The data name field is analogous to a table name in traditional RDBMS systems, though it’s fine to include further dotted notation if developers wish to impose their own hierarchy within the dataset namespace.

The appeal of this convention is that it’s very similar to a traditional RDBMS style, so it’s easy easy for developers to grok. When someone asks what to name a topic, you can always just ask them what they’d name their database/table if it were in an RDBMS, and suggest that as the dataset/data fields.

It’s also extensible. If developers or organizations wish to impose their own hierarchy that makes sense for their specific use cases or message types, they can do so by adding extra dotted fields in the data name section.

As far as structure goes, I recommend using snake_case (not camelCase, UpperCamelCase, or lisp-case).

Enforcing the rules

The most obvious way to enforce a naming convention is to disable auto.create.topics.enable, and limit who can create topics. Those who create topics are responsible for enforcing the rules, though an automated process for topic creation that forces users to define the various fields as part of a topic’s creation is preferred.

If users are allowed to create their own topics, then a script that monitors whether topics conform to expected conventions would at least raise an alert after a violation has already occurred. Unfortunately, this is probably too late.

Conclusion

That’s all I’ve got for now. I’m interested in knowing how other people format their topic names. Please leave a comment below, or mention me on twitter @criccomini.

Subscribe to my newsletter!