Kafka Connect: S3

{
	"name": "s3-source-connector",
	"config": {
		"name": "s3-source-connector",
		"connector.class": "io.confluent.connect.s3.source.S3SourceConnector",
		"store.url": "https://s3-jak01.storageraya.com",
		"s3.bucket.name": "my-bucket",
		"aws.access.key.id": "my-access-key",
		"aws.secret.access.key": "my-secret-key",
		"topics.dir": "quickstart",
		"topic.regex.list": "quick-start-topic:.*",
		"confluent.topic.bootstrap.servers": "broker:9092",
		"confluent.topic.replication.factor": 1,
		"format.class": "io.confluent.connect.s3.format.string.StringFormat",
		"mode": "GENERIC",
		"tasks.max": 1
		
	}
}

topics.dir“: track at files under folder s3://my-bucket/quickstart/. Use a blank space (” “) to track files under the bucket root s3://my-bucket/. If you do not set this, by default it will use folder s3://my-bucket/topics/

topic.regex.list“: track file “.*” (any), “.*\.json” (.json) and write it to Kafka topic “quick-start-topic”

format.class“: read here

mode“: there are 2 modes available to run s3-source. By default is RESTORE_BACKUP. What you want most probably is GENERIC.

tasks.max“: somehow this key needs to be stated when using mode: GENERIC.

References:

  • https://docs.confluent.io/platform/current/installation/configuration/connect/source-connect-configs.html
  • https://docs.confluent.io/kafka-connectors/s3-source/current/generalized/overview.html
  • https://docs.confluent.io/kafka-connectors/s3-source/current/backup-and-restore/overview.html
  • https://docs.confluent.io/kafka-connectors/s3-source/current/configuration_options.html
  • https://docs.confluent.io/kafka-connectors/s3-sink/current/overview.html
  • https://docs.confluent.io/kafka-connectors/s3-sink/current/configuration_options.html