Elasticsearch

Load data from Elasticsearch into CrateDB.

Elasticsearch is a source-available search engine developed by Elastic. It is based on Apache Lucene and provides a distributed, multitenant- capable full-text search engine with an HTTP web interface and schema- free JSON documents.

Prerequisites

Use Docker or Podman to run all components. This approach works consistently across Linux, macOS, and Windows.

Install

Install the most recent versions of HTTPie and cratedb-toolkit, or evaluate alternative installation methods.

uv tool install --upgrade 'httpie' 'cratedb-toolkit[io-ingest]'

Tutorial

5-minute step-by-step instructions about how to work with Elasticsearch and CrateDB.

Services

Run Elasticsearch and CrateDB using Docker or Podman.

docker run --rm --name=elasticsearch \
  --publish=9200:9200 --env=discovery.type=single-node \
  docker.elastic.co/elasticsearch/elasticsearch:7.17.29
docker run --rm --name=cratedb \
  --publish=4200:4200 --publish=5432:5432 --env=CRATE_HEAP_SIZE=2g \
  docker.io/crate:latest '-Cdiscovery.type=single-node'

Populate data

Create Elasticsearch index.

http PUT http://localhost:9200/example

Acquire example data.

wget https://cdn.crate.io/downloads/datasets/cratedb-datasets/academy/chicago-data/taxi_details.csv

Import data into Elasticsearch.

ingestr ingest --yes \
  --source-uri "csv://taxi_details.csv" \
  --source-table "data" \
  --dest-uri "elasticsearch://localhost:9200?secure=false" \
  --dest-table "taxi_details"

Load data

Use CrateDB Toolkit to load data from Elasticsearch index into CrateDB table.

ctk load \
    "elasticsearch://localhost:9200?secure=false&table=taxi_details" \
    "crate://crate:na@localhost:4200/testdrive/taxi_details"

Query data

Inspect CrateDB tables using crash.

crash -c "SHOW CREATE TABLE testdrive.taxi_details"
crash -c "SELECT count(*) FROM testdrive.taxi_details"
crash -c "SELECT * FROM testdrive.taxi_details"

Documentation

The Elasticsearch index name can be provided by using the &table= query parameter.

CrateDB options

Please make sure to replace username, password, and hostname with values matching your environment.

  • ssl: Use the ?ssl=true query parameter to enable SSL. Also use this when connecting to CrateDB Cloud.

    'crate://crate:crate@cratedb.example.org:4200/schema/table?ssl=true'
    

See also

CrateDB also provides native data import capabilities and support for different ETL applications and frameworks, see load data into CrateDB. If you have additional requirements on this or other I/O adapters, for example to support advanced processing options or different data formats, or if you want us to provide a managed variant, please let us know through any of our support channels, preferably on our community forum.

Use elasticsearch-compose.yml and elasticsearch-demo.sh for an end-to-end Elasticsearch+CrateDB-in-a-box example ETL rig using {Docker,Podman} Compose.