My organization has a new administrative system that exports data using Kafka. We'd like to be able to directly consume these Kafka topics/streams in DQ+ to do things like data quality checks. Until this functionality appears in DQ+ we'll probably need to stream data to a file data source which will be read and processed on a scheduled basis. A couple of features that seem like they would be needed are:
- Ability to trigger execution of an analysis or process model directly from the Kafka topic.
- Ability to control how often the analysis or process model is run: once for each message, only after a specified number of messages have accumulated, on a timed basis, or some combination of all of these.
However, we don't have a well-developed set of requirements right now other than the need to ingest data from Kafka.