Elasticsearch¶
Load data from Elasticsearch into CrateDB.
Elasticsearch is a source-available search engine developed by Elastic. It is based on Apache Lucene and provides a distributed, multitenant- capable full-text search engine with an HTTP web interface and schema- free JSON documents.
Prerequisites¶
Use Docker or Podman to run all components. This approach works consistently across Linux, macOS, and Windows.
Install¶
Install the most recent versions of HTTPie and cratedb-toolkit, or evaluate alternative installation methods.
uv tool install --upgrade 'httpie' 'cratedb-toolkit[io-ingest]'
Tutorial¶
5-minute step-by-step instructions about how to work with Elasticsearch and CrateDB.
Services¶
Run Elasticsearch and CrateDB using Docker or Podman.
docker run --rm --name=elasticsearch \
--publish=9200:9200 --env=discovery.type=single-node \
docker.elastic.co/elasticsearch/elasticsearch:7.17.29
docker run --rm --name=cratedb \
--publish=4200:4200 --publish=5432:5432 --env=CRATE_HEAP_SIZE=2g \
docker.io/crate:latest '-Cdiscovery.type=single-node'
Populate data¶
Create Elasticsearch index.
http PUT http://localhost:9200/example
Acquire example data.
wget https://cdn.crate.io/downloads/datasets/cratedb-datasets/academy/chicago-data/taxi_details.csv
Import data into Elasticsearch.
ingestr ingest --yes \
--source-uri "csv://taxi_details.csv" \
--source-table "data" \
--dest-uri "elasticsearch://localhost:9200?secure=false" \
--dest-table "taxi_details"
Load data¶
Use CrateDB Toolkit to load data from Elasticsearch index into CrateDB table.
ctk load \
"elasticsearch://localhost:9200?secure=false&table=taxi_details" \
"crate://crate:na@localhost:4200/testdrive/taxi_details"
Query data¶
Inspect CrateDB tables using crash.
crash -c "SHOW CREATE TABLE testdrive.taxi_details"
crash -c "SELECT count(*) FROM testdrive.taxi_details"
crash -c "SELECT * FROM testdrive.taxi_details"
Documentation¶
The Elasticsearch index name can be provided by using the &table= query parameter.
CrateDB options
Please make sure to replace username, password, and hostname with values matching your environment.
ssl: Use the?ssl=truequery parameter to enable SSL. Also use this when connecting to CrateDB Cloud.'crate://crate:crate@cratedb.example.org:4200/schema/table?ssl=true'
See also¶
CrateDB also provides native data import capabilities and support for different ETL applications and frameworks, see load data into CrateDB. If you have additional requirements on this or other I/O adapters, for example to support advanced processing options or different data formats, or if you want us to provide a managed variant, please let us know through any of our support channels, preferably on our community forum.
Use elasticsearch-compose.yml and elasticsearch-demo.sh for an end-to-end Elasticsearch+CrateDB-in-a-box example ETL rig using {Docker,Podman} Compose.