Browse Source

整理数据导入文件夹中的文件格式

pull/38/head
YuCheng Hu 1 year ago
parent
commit
c568381768
No known key found for this signature in database
GPG Key ID: C395DC68EF030B59
  1. 2
      SUMMARY.md
  2. 2
      design/Broker.md
  3. 4
      design/Coordinator.md
  4. 2
      design/Historical.md
  5. 2
      design/Indexer.md
  6. 2
      design/MiddleManager.md
  7. 2
      design/Overlord.md
  8. 2
      design/Peons.md
  9. 2
      design/router.md
  10. 123
      ingestion/data-management.md
  11. 0
      ingestion/datamanage.md
  12. 4
      ingestion/faq.md
  13. 2
      ingestion/hadoop.md
  14. 10
      ingestion/index.md
  15. 4
      ingestion/kafka.md
  16. 8
      ingestion/tasks.md
  17. 944
      operations/api-reference.md
  18. 13
      operations/api.md
  19. 203
      operations/auth-ldap.md
  20. 472
      operations/basic-cluster-tuning.md
  21. 129
      operations/druid-console.md
  22. 117
      operations/dump-segment.md
  23. 33
      operations/dynamic-config-provider.md
  24. 202
      operations/export-metadata.md
  25. 48
      operations/getting-started.md
  26. 39
      operations/high-availability.md
  27. 31
      operations/http-compression.md
  28. 48
      operations/insert-segment-to-db.md
  29. 34
      operations/kubernetes.md
  30. 64
      operations/management-uis.md
  31. 55
      operations/password-provider.md
  32. 1
      operations/passwordproviders.md
  33. 139
      operations/pull-deps.md
  34. 79
      operations/reset-cluster.md
  35. 103
      operations/rolling-updates.md
  36. 234
      operations/rule-configuration.md
  37. 352
      operations/security-overview.md
  38. 151
      operations/security-user-auth.md
  39. 101
      operations/segment-optimization.md
  40. 106
      operations/tls-support.md
  41. 127
      operations/use_sbt_to_build_fat_jar.md

2
SUMMARY.md

@ -52,7 +52,7 @@
* [摄取概述](ingestion/ingestion.md)
* [数据格式](ingestion/dataformats.md)
* [schema设计](ingestion/schemadesign.md)
* [数据管理](ingestion/datamanage.md)
* [数据管理](ingestion/data-management.md)
* [流式摄取](ingestion/kafka.md)
* [Apache Kafka](ingestion/kafka.md)
* [Apache Kinesis](ingestion/kinesis.md)

2
design/Broker.md

@ -16,7 +16,7 @@
对于Apache Druid Broker的配置,请参见 [Broker配置](../configuration/human-readable-byte.md#Broker)
### HTTP
对于Broker的API的列表,请参见 [Broker API](../operations/api.md#Broker)
对于Broker的API的列表,请参见 [Broker API](../operations/api-reference.md#Broker)
### 综述

4
design/Coordinator.md

@ -16,7 +16,7 @@
对于Apache Druid的Coordinator进程配置,详见 [Coordinator配置](../configuration/human-readable-byte.md#Coordinator)
### HTTP
对于Coordinator的API接口,详见 [Coordinator API](../operations/api.md#Coordinator)
对于Coordinator的API接口,详见 [Coordinator API](../operations/api-reference.md#Coordinator)
### 综述
Druid Coordinator程序主要负责段管理和分发。更具体地说,Druid Coordinator进程与Historical进程通信,根据配置加载或删除段。Druid Coordinator负责加载新段、删除过时段、管理段复制和平衡段负载。
@ -45,7 +45,7 @@ org.apache.druid.cli.Main server coordinator
每次运行时,Druid Coordinator都通过合并小段或拆分大片段来压缩段。当您的段没有进行段大小(可能会导致查询性能下降)优化时,该操作非常有用。有关详细信息,请参见[段大小优化](../operations/segmentSizeOpt.md)。
Coordinator首先根据[段搜索策略](#段搜索策略)查找要压缩的段。找到某些段后,它会发出[压缩任务](../ingestion/taskrefer.md#compact)来压缩这些段。运行压缩任务的最大数目为 `min(sum of worker capacity * slotRatio, maxSlots)`。请注意,即使 `min(sum of worker capacity * slotRatio, maxSlots)` = 0,如果为数据源启用了压缩,则始终会提交至少一个压缩任务。请参阅[压缩配置API](../operations/api.md#Coordinator)和[压缩配置](../configuration/human-readable-byte.md#Coordinator)以启用压缩。
Coordinator首先根据[段搜索策略](#段搜索策略)查找要压缩的段。找到某些段后,它会发出[压缩任务](../ingestion/taskrefer.md#compact)来压缩这些段。运行压缩任务的最大数目为 `min(sum of worker capacity * slotRatio, maxSlots)`。请注意,即使 `min(sum of worker capacity * slotRatio, maxSlots)` = 0,如果为数据源启用了压缩,则始终会提交至少一个压缩任务。请参阅[压缩配置API](../operations/api-reference.md#Coordinator)和[压缩配置](../configuration/human-readable-byte.md#Coordinator)以启用压缩。
压缩任务可能由于以下原因而失败:

2
design/Historical.md

@ -16,7 +16,7 @@
对于Apache Druid Historical的配置,请参见 [Historical配置](../configuration/human-readable-byte.md#Historical)
### HTTP
Historical的API列表,请参见 [Historical API](../operations/api.md#Historical)
Historical的API列表,请参见 [Historical API](../operations/api-reference.md#Historical)
### 运行
```json

2
design/Indexer.md

@ -24,7 +24,7 @@ Apache Druid索引器进程是MiddleManager + Peon任务执行系统的另一种
对于Apache Druid Indexer进程的配置,请参见 [Indexer配置](../configuration/human-readable-byte.md#Indexer)
### HTTP
Indexer进程与[MiddleManager](../operations/api.md#MiddleManager)共用API
Indexer进程与[MiddleManager](../operations/api-reference.md#MiddleManager)共用API
### 运行
```json

2
design/MiddleManager.md

@ -16,7 +16,7 @@
对于Apache Druid MiddleManager配置,可以参见[索引服务配置](../configuration/human-readable-byte.md#MiddleManager)
### HTTP
对于MiddleManager的API接口,详见 [MiddleManager API](../operations/api.md#MiddleManager)
对于MiddleManager的API接口,详见 [MiddleManager API](../operations/api-reference.md#MiddleManager)
### 综述
MiddleManager进程是执行提交的任务的工作进程。MiddleManager将任务转发给运行在不同jvm中的Peon。我们为每个任务设置单独的jvm的原因是为了隔离资源和日志。每个Peon一次只能运行一个任务,但是,一个MiddleManager可能有多个Peon。

2
design/Overlord.md

@ -16,7 +16,7 @@
对于Apache Druid的Overlord进程配置,详见 [Overlord配置](../configuration/human-readable-byte.md#Overlord)
### HTTP
对于Overlord的API接口,详见 [Overlord API](../operations/api.md#Overlord)
对于Overlord的API接口,详见 [Overlord API](../operations/api-reference.md#Overlord)
### 综述
Overlord进程负责接收任务、协调任务分配、创建任务锁并将状态返回给调用方。Overlord可以配置为本地模式运行或者远程模式运行(默认为本地)。在本地模式下,Overlord还负责创建执行任务的Peon, 在本地模式下运行Overlord时,还必须提供所有MiddleManager和Peon配置。本地模式通常用于简单的工作流。在远程模式下,Overlord和MiddleManager在不同的进程中运行,您可以在不同的服务器上运行每一个进程。如果要将索引服务用作所有Druid索引的单个端点,建议使用此模式。

2
design/Peons.md

@ -16,7 +16,7 @@
对于Apache Druid Peon配置,可以参见 [Peon查询配置](../configuration/human-readable-byte.md) 和 [额外的Peon配置](../configuration/human-readable-byte.md)
### HTTP
对于Peon的API接口,详见 [Peon API](../operations/api.md#Peon)
对于Peon的API接口,详见 [Peon API](../operations/api-reference.md#Peon)
Peon在单个JVM中运行单个任务。MiddleManager负责创建运行任务的Peon。Peon应该很少(如果为了测试目的)自己运行。

2
design/router.md

@ -28,7 +28,7 @@ Apache Druid Router用于将查询路由到不同的Broker。默认情况下,B
### HTTP
对于Router的API列表,请参见 [Router API](../operations/api.md#Router)
对于Router的API列表,请参见 [Router API](../operations/api-reference.md#Router)
### 运行

123
ingestion/data-management.md

@ -0,0 +1,123 @@
---
id: data-management
title: "Data management"
---
<!--
~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
~ distributed with this work for additional information
~ regarding copyright ownership. The ASF licenses this file
~ to you under the Apache License, Version 2.0 (the
~ "License"); you may not use this file except in compliance
~ with the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing,
~ software distributed under the License is distributed on an
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
~ KIND, either express or implied. See the License for the
~ specific language governing permissions and limitations
~ under the License.
-->
Within the context of this topic data management refers to Apache Druid's data maintenance capabilities for existing datasources. There are several options to help you keep your data relevant and to help your Druid cluster remain performant. For example updating, reingesting, adding lookups, reindexing, or deleting data.
In addition to the tasks covered on this page, you can also use segment compaction to improve the layout of your existing data. Refer to [Segment optimization](../operations/segment-optimization.md) to see if compaction will help in your environment. For an overview and steps to configure manual compaction tasks, see [Compaction](./compaction.md).
## Adding new data to existing datasources
Druid can insert new data to an existing datasource by appending new segments to existing segment sets. It can also add new data by merging an existing set of segments with new data and overwriting the original set.
Druid does not support single-record updates by primary key.
<a name="update"></a>
## Updating existing data
Once you ingest some data in a dataSource for an interval and create Apache Druid segments, you might want to make changes to
the ingested data. There are several ways this can be done.
### Using lookups
If you have a dimension where values need to be updated frequently, try first using [lookups](../querying/lookups.md). A
classic use case of lookups is when you have an ID dimension stored in a Druid segment, and want to map the ID dimension to a
human-readable String value that may need to be updated periodically.
### Reingesting data
If lookup-based techniques are not sufficient, you will need to reingest data into Druid for the time chunks that you
want to update. This can be done using one of the [batch ingestion methods](index.md#batch) in overwrite mode (the
default mode). It can also be done using [streaming ingestion](index.md#streaming), provided you drop data for the
relevant time chunks first.
If you do the reingestion in batch mode, Druid's atomic update mechanism means that queries will flip seamlessly from
the old data to the new data.
We recommend keeping a copy of your raw data around in case you ever need to reingest it.
### With Hadoop-based ingestion
This section assumes you understand how to do batch ingestion using Hadoop. See
[Hadoop batch ingestion](./hadoop.md) for more information. Hadoop batch-ingestion can be used for reindexing and delta ingestion.
Druid uses an `inputSpec` in the `ioConfig` to know where the data to be ingested is located and how to read it.
For simple Hadoop batch ingestion, `static` or `granularity` spec types allow you to read data stored in deep storage.
There are other types of `inputSpec` to enable reindexing and delta ingestion.
### Reindexing with Native Batch Ingestion
This section assumes you understand how to do batch ingestion without Hadoop using [native batch indexing](../ingestion/native-batch.md). Native batch indexing uses an `inputSource` to know where and how to read the input data. You can use the [`DruidInputSource`](native-batch.md#druid-input-source) to read data from segments inside Druid. You can use Parallel task (`index_parallel`) for all native batch reindexing tasks. Increase the `maxNumConcurrentSubTasks` to accommodate the amount of data your are reindexing. See [Capacity planning](native-batch.md#capacity-planning).
<a name="delete"></a>
## Deleting data
Druid supports permanent deletion of segments that are in an "unused" state (see the
[Segment lifecycle](../design/architecture.md#segment-lifecycle) section of the Architecture page).
The Kill Task deletes unused segments within a specified interval from metadata storage and deep storage.
For more information, please see [Kill Task](../ingestion/tasks.md#kill).
Permanent deletion of a segment in Apache Druid has two steps:
1. The segment must first be marked as "unused". This occurs when a segment is dropped by retention rules, and when a user manually disables a segment through the Coordinator API.
2. After segments have been marked as "unused", a Kill Task will delete any "unused" segments from Druid's metadata store as well as deep storage.
For documentation on retention rules, please see [Data Retention](../operations/rule-configuration.md).
For documentation on disabling segments using the Coordinator API, please see the
[Coordinator Datasources API](../operations/api-reference.md#coordinator-datasources) reference.
A data deletion tutorial is available at [Tutorial: Deleting data](../tutorials/tutorial-delete-data.md)
## Kill Task
Kill tasks delete all information about a segment and removes it from deep storage. Segments to kill must be unused (used==0) in the Druid segment table. The available grammar is:
```json
{
"type": "kill",
"id": <task_id>,
"dataSource": <task_datasource>,
"interval" : <all_segments_in_this_interval_will_die!>,
"context": <task context>
}
```
## Retention
Druid supports retention rules, which are used to define intervals of time where data should be preserved, and intervals where data should be discarded.
Druid also supports separating Historical processes into tiers, and the retention rules can be configured to assign data for specific intervals to specific tiers.
These features are useful for performance/cost management; a common use case is separating Historical processes into a "hot" tier and a "cold" tier.
For more information, please see [Load rules](../operations/rule-configuration.md).
## Learn more
See the following topics for more information:
- [Compaction](./compaction.md) for an overview and steps to configure manual compaction tasks.
- [Segments](../design/segments.md) for information on how Druid handles segment versioning.

0
ingestion/datamanage.md

4
ingestion/faq.md

@ -171,7 +171,7 @@ Druid会拒绝时间窗口之外的事件, 确认事件是否被拒绝了的
您可以将 [DruidInputSource](native.md#Druid输入源) 与 [并行任务](native.md#并行任务) 一起使用,以使用新schema摄取现有的druid段,并更改该段的name、dimensions、metrics、rollup等。有关详细信息,请参阅 [DruidInputSource](native.md#Druid输入源)。或者,如果使用基于hadoop的摄取,那么可以使用"dataSource"输入规范来重新编制索引。
有关详细信息,请参阅 [数据管理](datamanage.md) 页的 [更新现有数据](datamanage.md#更新现有的数据) 部分。
有关详细信息,请参阅 [数据管理](data-management.md) 页的 [更新现有数据](data-management.md#更新现有的数据) 部分。
### 如果更改Druid中现有数据的段粒度
@ -179,7 +179,7 @@ Druid会拒绝时间窗口之外的事件, 确认事件是否被拒绝了的
为此,使用 [DruidInputSource](native.md#Druid输入源) 并运行一个 [并行任务](native.md#并行任务)。[DruidInputSource](native.md#Druid输入源) 将允许你从Druid中获取现有的段并将它们聚合并反馈给Druid。它还允许您在反馈数据时过滤这些段中的数据,这意味着,如果有要删除的行,可以在重新摄取期间将它们过滤掉。通常,上面的操作将作为一个批处理作业运行,即每天输入一大块数据并对其进行聚合。或者,如果使用基于hadoop的摄取,那么可以使用"dataSource"输入规范来重新编制索引。
有关详细信息,请参阅 [数据管理](datamanage.md) 页的 [更新现有数据](datamanage.md#更新现有的数据) 部分。
有关详细信息,请参阅 [数据管理](data-management.md) 页的 [更新现有数据](data-management.md#更新现有的数据) 部分。
### 实时摄取似乎被卡住了

2
ingestion/hadoop.md

@ -1122,7 +1122,7 @@ you have to take caution to not override segments created by real-time processin
Apache Druid当前支持通过一个Hadoop摄取任务来支持基于Apache Hadoop的批量索引任务, 这些任务被提交到 [Druid Overlord](../design/Overlord.md)的一个运行实例上。详情可以查看 [基于Hadoop的摄取vs基于本地批摄取的对比](ingestion.md#批量摄取) 来了解基于Hadoop的摄取、本地简单批摄取、本地并行摄取三者的比较。
运行一个基于Hadoop的批量摄取任务,首先需要编写一个如下的摄取规范, 然后提交到Overlord的 [`druid/indexer/v1/task`](../operations/api.md#overlord) 接口,或者使用Druid软件包中自带的 `bin/post-index-task` 脚本。
运行一个基于Hadoop的批量摄取任务,首先需要编写一个如下的摄取规范, 然后提交到Overlord的 [`druid/indexer/v1/task`](../operations/api-reference.md#overlord) 接口,或者使用Druid软件包中自带的 `bin/post-index-task` 脚本。
### 教程

10
ingestion/index.md

@ -775,7 +775,7 @@ Druid中的所有数据都被组织成*段*,这些段是数据文件,通常
#### 数据源
Druid数据存储在数据源中,与传统RDBMS中的表类似。Druid提供了一个独特的数据建模系统,它与关系模型和时间序列模型都具有相似性。
#### 主时间戳列
Druid Schema必须始终包含一个主时间戳。主时间戳用于对 [数据进行分区和排序](#分区)。Druid查询能够快速识别和检索与主时间戳列的时间范围相对应的数据。Druid还可以将主时间戳列用于基于时间的[数据管理操作](datamanage.md),例如删除时间块、覆盖时间块和基于时间的保留规则。
Druid Schema必须始终包含一个主时间戳。主时间戳用于对 [数据进行分区和排序](#分区)。Druid查询能够快速识别和检索与主时间戳列的时间范围相对应的数据。Druid还可以将主时间戳列用于基于时间的[数据管理操作](data-management.md),例如删除时间块、覆盖时间块和基于时间的保留规则。
主时间戳基于 [`timestampSpec`](#timestampSpec) 进行解析。此外,[`granularitySpec`](#granularitySpec) 控制基于主时间戳的其他重要操作。无论从哪个输入字段读取主时间戳,它都将作为名为 `__time` 的列存储在Druid数据源中。
@ -824,7 +824,7 @@ SELECT SUM("cnt") / COUNT(*) * 1.0 FROM datasource
* 使用 [Sketches](schemadesign.md#Sketches高基维处理) 避免存储高基数维度,因为会损害汇总比率
* 在摄入时调整 `queryGranularity`(例如,使用 `PT5M` 而不是 `PT1M` )会增加Druid中两行具有匹配时间戳的可能性,并可以提高汇总率
* 将相同的数据加载到多个Druid数据源中是有益的。有些用户选择创建禁用汇总(或启用汇总,但汇总比率最小)的"完整"数据源和具有较少维度和较高汇总比率的"缩写"数据源。当查询只涉及"缩写"集里边的维度时,使用该数据源将导致更快的查询时间,这种方案只需稍微增加存储空间即可完成,因为简化的数据源往往要小得多。
* 如果您使用的 [尽力而为的汇总(best-effort rollup)](#) 摄取配置不能保证[完全汇总(perfect rollup)](#),则可以通过切换到保证的完全汇总选项,或在初始摄取后在[后台重新编制(reindex)](./datamanage.md#压缩与重新索引)数据索引,潜在地提高汇总比率。
* 如果您使用的 [尽力而为的汇总(best-effort rollup)](#) 摄取配置不能保证[完全汇总(perfect rollup)](#),则可以通过切换到保证的完全汇总选项,或在初始摄取后在[后台重新编制(reindex)](./data-management.md#压缩与重新索引)数据索引,潜在地提高汇总比率。
#### 最佳rollup VS 尽可能rollup
一些Druid摄取方法保证了*完美的汇总(perfect rollup)*,这意味着输入数据在摄取时被完美地聚合。另一些则提供了*尽力而为的汇总(best-effort rollup)*,这意味着输入数据可能无法完全聚合,因此可能有多个段保存具有相同时间戳和维度值的行。
@ -857,7 +857,7 @@ Druid数据源总是按时间划分为*时间块*,每个时间块包含一个
> 但是,请注意,目前,Druid总是首先按时间戳对一个段内的行进行排序,甚至在 `dimensionsSpec` 中列出的第一个维度之前,这将使得维度排序达不到最大效率。如果需要,可以通过在 `granularitySpec` 中将 `queryGranularity` 设置为等于 `segmentGranularity` 的值来解决此限制,这将把段内的所有时间戳设置为相同的值,并将"真实"时间戳保存为[辅助时间戳](./schemadesign.md#辅助时间戳)。这个限制可能在Druid的未来版本中被移除。
#### 如何设置分区
并不是所有的摄入方式都支持显式的分区配置,也不是所有的方法都具有同样的灵活性。在当前的Druid版本中,如果您是通过一个不太灵活的方法(如Kafka)进行初始摄取,那么您可以使用 [重新索引的技术(reindex)](./datamanage.md#压缩与重新索引),在最初摄取数据后对其重新分区。这是一种强大的技术:即使您不断地从流中添加新数据, 也可以使用它来确保任何早于某个阈值的数据都得到最佳分区。
并不是所有的摄入方式都支持显式的分区配置,也不是所有的方法都具有同样的灵活性。在当前的Druid版本中,如果您是通过一个不太灵活的方法(如Kafka)进行初始摄取,那么您可以使用 [重新索引的技术(reindex)](./data-management.md#压缩与重新索引),在最初摄取数据后对其重新分区。这是一种强大的技术:即使您不断地从流中添加新数据, 也可以使用它来确保任何早于某个阈值的数据都得到最佳分区。
下表显示了每个摄取方法如何处理分区:
@ -865,8 +865,8 @@ Druid数据源总是按时间划分为*时间块*,每个时间块包含一个
| - | - |
| [本地批](native.md) | 通过 `tuningConfig` 中的 [`partitionsSpec`](./native.md#partitionsSpec) |
| [Hadoop批](hadoop.md) | 通过 `tuningConfig` 中的 [`partitionsSpec`](./native.md#partitionsSpec) |
| [Kafka索引服务](kafka.md) | Druid中的分区是由Kafka主题的分区方式决定的。您可以在初次摄入后 [重新索引的技术(reindex)](./datamanage.md#压缩与重新索引)以重新分区 |
| [Kinesis索引服务](kinesis.md) | Druid中的分区是由Kinesis流的分区方式决定的。您可以在初次摄入后 [重新索引的技术(reindex)](./datamanage.md#压缩与重新索引)以重新分区 |
| [Kafka索引服务](kafka.md) | Druid中的分区是由Kafka主题的分区方式决定的。您可以在初次摄入后 [重新索引的技术(reindex)](./data-management.md#压缩与重新索引)以重新分区 |
| [Kinesis索引服务](kinesis.md) | Druid中的分区是由Kinesis流的分区方式决定的。您可以在初次摄入后 [重新索引的技术(reindex)](./data-management.md#压缩与重新索引)以重新分区 |
> [!WARNING]
>

4
ingestion/kafka.md

@ -108,7 +108,7 @@ curl -X POST -H 'Content-Type: application/json' -d @supervisor-spec.json http:/
| `indexSpecForIntermediatePersists` | | 定义要在索引时用于中间持久化临时段的段存储格式选项。这可用于禁用中间段上的维度/度量压缩,以减少最终合并所需的内存。但是,在中间段上禁用压缩可能会增加页缓存的使用,而在它们被合并到发布的最终段之前使用它们,有关可能的值,请参阅IndexSpec。 | 否(默认与 `indexSpec` 相同) |
| `reportParseExceptions` | Boolean | *已废弃*。如果为true,则在解析期间遇到的异常即停止摄取;如果为false,则将跳过不可解析的行和字段。将 `reportParseExceptions` 设置为 `true` 将覆盖`maxParseExceptions` 和 `maxSavedParseExceptions` 的现有配置,将`maxParseExceptions` 设置为 `0` 并将 `maxSavedParseExceptions` 限制为不超过1。 | 否(默认为false)|
| `handoffConditionTimeout` | Long | 段切换(持久化)可以等待的毫秒数(超时时间)。 该值要被设置为大于0的数,设置为0意味着将会一直等待不超时 | 否(默认为0)|
| `resetOffsetAutomatically` | Boolean | 控制当Druid需要读取Kafka中不可用的消息时的行为,比如当发生了 `OffsetOutOfRangeException` 异常时。 <br> 如果为false,则异常将抛出,这将导致任务失败并停止接收。如果发生这种情况,则需要手动干预来纠正这种情况;可能使用 [重置 Supervisor API](../operations/api.md#Supervisor)。此模式对于生产非常有用,因为它将使您意识到摄取的问题。 <br> 如果为true,Druid将根据 `useEarliestOffset` 属性的值(`true` 为 `earliest`,`false` 为 `latest`)自动重置为Kafka中可用的较早或最新偏移量。请注意,这可能导致数据在您不知情的情况下*被丢弃*(如果`useEarliestOffset` 为 `false`)或 *重复*(如果 `useEarliestOffset``true`)。消息将被记录下来,以标识已发生重置,但摄取将继续。这种模式对于非生产环境非常有用,因为它将使Druid尝试自动从问题中恢复,即使这些问题会导致数据被安静删除或重复。 <br> 该特性与Kafka的 `auto.offset.reset` 消费者属性很相似 | 否(默认为false)|
| `resetOffsetAutomatically` | Boolean | 控制当Druid需要读取Kafka中不可用的消息时的行为,比如当发生了 `OffsetOutOfRangeException` 异常时。 <br> 如果为false,则异常将抛出,这将导致任务失败并停止接收。如果发生这种情况,则需要手动干预来纠正这种情况;可能使用 [重置 Supervisor API](../operations/api-reference.md#Supervisor)。此模式对于生产非常有用,因为它将使您意识到摄取的问题。 <br> 如果为true,Druid将根据 `useEarliestOffset` 属性的值(`true` 为 `earliest`,`false` 为 `latest`)自动重置为Kafka中可用的较早或最新偏移量。请注意,这可能导致数据在您不知情的情况下*被丢弃*(如果`useEarliestOffset` 为 `false`)或 *重复*(如果 `useEarliestOffset``true`)。消息将被记录下来,以标识已发生重置,但摄取将继续。这种模式对于非生产环境非常有用,因为它将使Druid尝试自动从问题中恢复,即使这些问题会导致数据被安静删除或重复。 <br> 该特性与Kafka的 `auto.offset.reset` 消费者属性很相似 | 否(默认为false)|
| `workerThreads` | Integer | supervisor用于异步操作的线程数。| 否(默认为: min(10, taskCount)) |
| `chatThreads` | Integer | 与索引任务的会话线程数 | 否(默认为:min(10, taskCount * replicas))|
| `chatRetries` | Integer | 在任务没有响应之前,将重试对索引任务的HTTP请求的次数 | 否(默认为8)|
@ -177,7 +177,7 @@ Kafka索引服务同时支持通过 [`inputFormat`](dataformats.md#inputformat)
### 操作
本节描述了一些supervisor API如何在Kafka索引服务中具体工作。对于所有的supervisor API,请查看 [Supervisor APIs](../operations/api.md#Supervisor)
本节描述了一些supervisor API如何在Kafka索引服务中具体工作。对于所有的supervisor API,请查看 [Supervisor APIs](../operations/api-reference.md#Supervisor)
#### 获取supervisor的状态报告

8
ingestion/tasks.md

@ -405,13 +405,13 @@ See the documentation on [deleting data](../ingestion/data-management.md#delete)
任务在Druid中完成所有与 [摄取](ingestion.md) 相关的工作。
对于批量摄取,通常使用 [任务api](../operations/api.md#Overlord) 直接将任务提交给Druid。对于流式接收,任务通常被提交给supervisor。
对于批量摄取,通常使用 [任务api](../operations/api-reference.md#Overlord) 直接将任务提交给Druid。对于流式接收,任务通常被提交给supervisor。
### 任务API
任务API主要在两个地方是可用的:
* [Overlord](../design/Overlord.md) 进程提供HTTP API接口来进行提交任务、取消任务、检查任务状态、查看任务日志与报告等。 查看 [任务API文档](../operations/api.md) 可以看到完整列表
* [Overlord](../design/Overlord.md) 进程提供HTTP API接口来进行提交任务、取消任务、检查任务状态、查看任务日志与报告等。 查看 [任务API文档](../operations/api-reference.md) 可以看到完整列表
* Druid SQL包括了一个 [`sys.tasks`](../querying/druidsql.md#系统Schema) ,保存了当前任务运行的信息。 此表是只读的,并且可以通过Overlord API查询完整信息的有限制的子集。
### 任务报告
@ -710,11 +710,11 @@ http://<middlemanager-host>:<worker-port>/druid/worker/v1/chat/<task-id>/unparse
#### `compact`
压缩任务合并给定间隔的所有段。有关详细信息,请参见有关 [压缩](datamanage.md#压缩与重新索引) 的文档。
压缩任务合并给定间隔的所有段。有关详细信息,请参见有关 [压缩](data-management.md#压缩与重新索引) 的文档。
#### `kill`
Kill tasks删除有关某些段的所有元数据,并将其从深层存储中删除。有关详细信息,请参阅有关 [删除数据](datamanage.md#删除数据) 的文档。
Kill tasks删除有关某些段的所有元数据,并将其从深层存储中删除。有关详细信息,请参阅有关 [删除数据](data-management.md#删除数据) 的文档。
#### `append`

944
operations/api-reference.md

@ -0,0 +1,944 @@
---
id: api-reference
title: "API reference"
---
<!--
~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
~ distributed with this work for additional information
~ regarding copyright ownership. The ASF licenses this file
~ to you under the Apache License, Version 2.0 (the
~ "License"); you may not use this file except in compliance
~ with the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing,
~ software distributed under the License is distributed on an
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
~ KIND, either express or implied. See the License for the
~ specific language governing permissions and limitations
~ under the License.
-->
This page documents all of the API endpoints for each Druid service type.
## Common
The following endpoints are supported by all processes.
### Process information
#### GET
* `/status`
Returns the Druid version, loaded extensions, memory used, total memory and other useful information about the process.
* `/status/health`
An endpoint that always returns a boolean "true" value with a 200 OK response, useful for automated health checks.
* `/status/properties`
Returns the current configuration properties of the process.
* `/status/selfDiscovered/status`
Returns a JSON map of the form `{"selfDiscovered": true/false}`, indicating whether the node has received a confirmation
from the central node discovery mechanism (currently ZooKeeper) of the Druid cluster that the node has been added to the
cluster. It is recommended to not consider a Druid node "healthy" or "ready" in automated deployment/container
management systems until it returns `{"selfDiscovered": true}` from this endpoint. This is because a node may be
isolated from the rest of the cluster due to network issues and it doesn't make sense to consider nodes "healthy" in
this case. Also, when nodes such as Brokers use ZooKeeper segment discovery for building their view of the Druid cluster
(as opposed to HTTP segment discovery), they may be unusable until the ZooKeeper client is fully initialized and starts
to receive data from the ZooKeeper cluster. `{"selfDiscovered": true}` is a proxy event indicating that the ZooKeeper
client on the node has started to receive data from the ZooKeeper cluster and it's expected that all segments and other
nodes will be discovered by this node timely from this point.
* `/status/selfDiscovered`
Similar to `/status/selfDiscovered/status`, but returns 200 OK response with empty body if the node has discovered itself
and 503 SERVICE UNAVAILABLE if the node hasn't discovered itself yet. This endpoint might be useful because some
monitoring checks such as AWS load balancer health checks are not able to look at the response body.
## Master Server
This section documents the API endpoints for the processes that reside on Master servers (Coordinators and Overlords)
in the suggested [three-server configuration](../design/processes.md#server-types).
### Coordinator
#### Leadership
##### GET
* `/druid/coordinator/v1/leader`
Returns the current leader Coordinator of the cluster.
* `/druid/coordinator/v1/isLeader`
Returns a JSON object with field "leader", either true or false, indicating if this server is the current leader
Coordinator of the cluster. In addition, returns HTTP 200 if the server is the current leader and HTTP 404 if not.
This is suitable for use as a load balancer status check if you only want the active leader to be considered in-service
at the load balancer.
<a name="coordinator-segment-loading"></a>
#### Segment Loading
##### GET
* `/druid/coordinator/v1/loadstatus`
Returns the percentage of segments actually loaded in the cluster versus segments that should be loaded in the cluster.
* `/druid/coordinator/v1/loadstatus?simple`
Returns the number of segments left to load until segments that should be loaded in the cluster are available for queries. This does not include segment replication counts.
* `/druid/coordinator/v1/loadstatus?full`
Returns the number of segments left to load in each tier until segments that should be loaded in the cluster are all available. This includes segment replication counts.
* `/druid/coordinator/v1/loadstatus?full&computeUsingClusterView`
Returns the number of segments not yet loaded for each tier until all segments loading in the cluster are available.
The result includes segment replication counts. It also factors in the number of available nodes that are of a service type that can load the segment when computing the number of segments remaining to load.
A segment is considered fully loaded when:
- Druid has replicated it the number of times configured in the corresponding load rule.
- Or the number of replicas for the segment in each tier where it is configured to be replicated equals the available nodes of a service type that are currently allowed to load the segment in the tier.
* `/druid/coordinator/v1/loadqueue`
Returns the ids of segments to load and drop for each Historical process.
* `/druid/coordinator/v1/loadqueue?simple`
Returns the number of segments to load and drop, as well as the total segment load and drop size in bytes for each Historical process.
* `/druid/coordinator/v1/loadqueue?full`
Returns the serialized JSON of segments to load and drop for each Historical process.
#### Segment Loading by Datasource
Note that all _interval_ query parameters are ISO 8601 strings (e.g., 2016-06-27/2016-06-28).
Also note that these APIs only guarantees that the segments are available at the time of the call.
Segments can still become missing because of historical process failures or any other reasons afterward.
##### GET
* `/druid/coordinator/v1/datasources/{dataSourceName}/loadstatus?forceMetadataRefresh={boolean}&interval={myInterval}`
Returns the percentage of segments actually loaded in the cluster versus segments that should be loaded in the cluster for the given
datasource over the given interval (or last 2 weeks if interval is not given). `forceMetadataRefresh` is required to be set.
Setting `forceMetadataRefresh` to true will force the coordinator to poll latest segment metadata from the metadata store
(Note: `forceMetadataRefresh=true` refreshes Coordinator's metadata cache of all datasources. This can be a heavy operation in terms
of the load on the metadata store but can be necessary to make sure that we verify all the latest segments' load status)
Setting `forceMetadataRefresh` to false will use the metadata cached on the coordinator from the last force/periodic refresh.
If no used segments are found for the given inputs, this API returns `204 No Content`
* `/druid/coordinator/v1/datasources/{dataSourceName}/loadstatus?simple&forceMetadataRefresh={boolean}&interval={myInterval}`
Returns the number of segments left to load until segments that should be loaded in the cluster are available for the given datasource
over the given interval (or last 2 weeks if interval is not given). This does not include segment replication counts. `forceMetadataRefresh` is required to be set.
Setting `forceMetadataRefresh` to true will force the coordinator to poll latest segment metadata from the metadata store
(Note: `forceMetadataRefresh=true` refreshes Coordinator's metadata cache of all datasources. This can be a heavy operation in terms
of the load on the metadata store but can be necessary to make sure that we verify all the latest segments' load status)
Setting `forceMetadataRefresh` to false will use the metadata cached on the coordinator from the last force/periodic refresh.
If no used segments are found for the given inputs, this API returns `204 No Content`
* `/druid/coordinator/v1/datasources/{dataSourceName}/loadstatus?full&forceMetadataRefresh={boolean}&interval={myInterval}`
Returns the number of segments left to load in each tier until segments that should be loaded in the cluster are all available for the given datasource
over the given interval (or last 2 weeks if interval is not given). This includes segment replication counts. `forceMetadataRefresh` is required to be set.
Setting `forceMetadataRefresh` to true will force the coordinator to poll latest segment metadata from the metadata store
(Note: `forceMetadataRefresh=true` refreshes Coordinator's metadata cache of all datasources. This can be a heavy operation in terms
of the load on the metadata store but can be necessary to make sure that we verify all the latest segments' load status)
Setting `forceMetadataRefresh` to false will use the metadata cached on the coordinator from the last force/periodic refresh.
You can pass the optional query parameter `computeUsingClusterView` to factor in the available cluster services when calculating
the segments left to load. See [Coordinator Segment Loading](#coordinator-segment-loading) for details.
If no used segments are found for the given inputs, this API returns `204 No Content`
#### Metadata store information
##### GET
* `/druid/coordinator/v1/metadata/segments`
Returns a list of all segments for each datasource enabled in the cluster.
* `/druid/coordinator/v1/metadata/segments?datasources={dataSourceName1}&datasources={dataSourceName2}`
Returns a list of all segments for one or more specific datasources enabled in the cluster.
* `/druid/coordinator/v1/metadata/segments?includeOvershadowedStatus`
Returns a list of all segments for each datasource with the full segment metadata and an extra field `overshadowed`.
* `/druid/coordinator/v1/metadata/segments?includeOvershadowedStatus&datasources={dataSourceName1}&datasources={dataSourceName2}`
Returns a list of all segments for one or more specific datasources with the full segment metadata and an extra field `overshadowed`.
* `/druid/coordinator/v1/metadata/datasources`
Returns a list of the names of datasources with at least one used segment in the cluster.
* `/druid/coordinator/v1/metadata/datasources?includeUnused`
Returns a list of the names of datasources, regardless of whether there are used segments belonging to those datasources in the cluster or not.
* `/druid/coordinator/v1/metadata/datasources?includeDisabled`
Returns a list of the names of datasources, regardless of whether the datasource is disabled or not.
* `/druid/coordinator/v1/metadata/datasources?full`
Returns a list of all datasources with at least one used segment in the cluster. Returns all metadata about those datasources as stored in the metadata store.
* `/druid/coordinator/v1/metadata/datasources/{dataSourceName}`
Returns full metadata for a datasource as stored in the metadata store.
* `/druid/coordinator/v1/metadata/datasources/{dataSourceName}/segments`
Returns a list of all segments for a datasource as stored in the metadata store.
* `/druid/coordinator/v1/metadata/datasources/{dataSourceName}/segments?full`
Returns a list of all segments for a datasource with the full segment metadata as stored in the metadata store.
* `/druid/coordinator/v1/metadata/datasources/{dataSourceName}/segments/{segmentId}`
Returns full segment metadata for a specific segment as stored in the metadata store, if the segment is used. If the
segment is unused, or is unknown, a 404 response is returned.
##### POST
* `/druid/coordinator/v1/metadata/datasources/{dataSourceName}/segments`
Returns a list of all segments, overlapping with any of given intervals, for a datasource as stored in the metadata store. Request body is array of string IS0 8601 intervals like [interval1, interval2,...] for example ["2012-01-01T00:00:00.000/2012-01-03T00:00:00.000", "2012-01-05T00:00:00.000/2012-01-07T00:00:00.000"]
* `/druid/coordinator/v1/metadata/datasources/{dataSourceName}/segments?full`
Returns a list of all segments, overlapping with any of given intervals, for a datasource with the full segment metadata as stored in the metadata store. Request body is array of string ISO 8601 intervals like [interval1, interval2,...] for example ["2012-01-01T00:00:00.000/2012-01-03T00:00:00.000", "2012-01-05T00:00:00.000/2012-01-07T00:00:00.000"]
<a name="coordinator-datasources"></a>
#### Datasources
Note that all _interval_ URL parameters are ISO 8601 strings delimited by a `_` instead of a `/`
(e.g., 2016-06-27_2016-06-28).
##### GET
* `/druid/coordinator/v1/datasources`
Returns a list of datasource names found in the cluster.
* `/druid/coordinator/v1/datasources?simple`
Returns a list of JSON objects containing the name and properties of datasources found in the cluster. Properties include segment count, total segment byte size, replicated total segment byte size, minTime, and maxTime.
* `/druid/coordinator/v1/datasources?full`
Returns a list of datasource names found in the cluster with all metadata about those datasources.
* `/druid/coordinator/v1/datasources/{dataSourceName}`
Returns a JSON object containing the name and properties of a datasource. Properties include segment count, total segment byte size, replicated total segment byte size, minTime, and maxTime.
* `/druid/coordinator/v1/datasources/{dataSourceName}?full`
Returns full metadata for a datasource .
* `/druid/coordinator/v1/datasources/{dataSourceName}/intervals`
Returns a set of segment intervals.
* `/druid/coordinator/v1/datasources/{dataSourceName}/intervals?simple`
Returns a map of an interval to a JSON object containing the total byte size of segments and number of segments for that interval.
* `/druid/coordinator/v1/datasources/{dataSourceName}/intervals?full`
Returns a map of an interval to a map of segment metadata to a set of server names that contain the segment for that interval.
* `/druid/coordinator/v1/datasources/{dataSourceName}/intervals/{interval}`
Returns a set of segment ids for an interval.
* `/druid/coordinator/v1/datasources/{dataSourceName}/intervals/{interval}?simple`
Returns a map of segment intervals contained within the specified interval to a JSON object containing the total byte size of segments and number of segments for an interval.
* `/druid/coordinator/v1/datasources/{dataSourceName}/intervals/{interval}?full`
Returns a map of segment intervals contained within the specified interval to a map of segment metadata to a set of server names that contain the segment for an interval.
* `/druid/coordinator/v1/datasources/{dataSourceName}/intervals/{interval}/serverview`
Returns a map of segment intervals contained within the specified interval to information about the servers that contain the segment for an interval.
* `/druid/coordinator/v1/datasources/{dataSourceName}/segments`
Returns a list of all segments for a datasource in the cluster.
* `/druid/coordinator/v1/datasources/{dataSourceName}/segments?full`
Returns a list of all segments for a datasource in the cluster with the full segment metadata.
* `/druid/coordinator/v1/datasources/{dataSourceName}/segments/{segmentId}`
Returns full segment metadata for a specific segment in the cluster.
* `/druid/coordinator/v1/datasources/{dataSourceName}/tiers`
Return the tiers that a datasource exists in.
#### Note for coordinator's POST and DELETE API's
The segments would be enabled when these API's are called, but then can be disabled again by the coordinator if any dropRule matches. Segments enabled by these API's might not be loaded by historical processes if no loadRule matches. If an indexing or kill task runs at the same time as these API's are invoked, the behavior is undefined. Some segments might be killed and others might be enabled. It's also possible that all segments might be disabled but at the same time, the indexing task is able to read data from those segments and succeed.
Caution : Avoid using indexing or kill tasks and these API's at the same time for the same datasource and time chunk. (It's fine if the time chunks or datasource don't overlap)
##### POST
* `/druid/coordinator/v1/datasources/{dataSourceName}`
Marks as used all segments belonging to a datasource. Returns a JSON object of the form
`{"numChangedSegments": <number>}` with the number of segments in the database whose state has been changed (that is,
the segments were marked as used) as the result of this API call.
* `/druid/coordinator/v1/datasources/{dataSourceName}/segments/{segmentId}`
Marks as used a segment of a datasource. Returns a JSON object of the form `{"segmentStateChanged": <boolean>}` with
the boolean indicating if the state of the segment has been changed (that is, the segment was marked as used) as the
result of this API call.
* `/druid/coordinator/v1/datasources/{dataSourceName}/markUsed`
* `/druid/coordinator/v1/datasources/{dataSourceName}/markUnused`
Marks segments (un)used for a datasource by interval or set of segment Ids.
When marking used only segments that are not overshadowed will be updated.
The request payload contains the interval or set of segment Ids to be marked unused.
Either interval or segment ids should be provided, if both or none are provided in the payload, the API would throw an error (400 BAD REQUEST).
Interval specifies the start and end times as IS0 8601 strings. `interval=(start/end)` where start and end both are inclusive and only the segments completely contained within the specified interval will be disabled, partially overlapping segments will not be affected.
JSON Request Payload:
|Key|Description|Example|
|----------|-------------|---------|
|`interval`|The interval for which to mark segments unused|"2015-09-12T03:00:00.000Z/2015-09-12T05:00:00.000Z"|
|`segmentIds`|Set of segment Ids to be marked unused|["segmentId1", "segmentId2"]|
##### DELETE
* `/druid/coordinator/v1/datasources/{dataSourceName}`
Marks as unused all segments belonging to a datasource. Returns a JSON object of the form
`{"numChangedSegments": <number>}` with the number of segments in the database whose state has been changed (that is,
the segments were marked as unused) as the result of this API call.
* `/druid/coordinator/v1/datasources/{dataSourceName}/intervals/{interval}`
* `@Deprecated. /druid/coordinator/v1/datasources/{dataSourceName}?kill=true&interval={myInterval}`
Runs a [Kill task](../ingestion/tasks.md) for a given interval and datasource.
* `/druid/coordinator/v1/datasources/{dataSourceName}/segments/{segmentId}`
Marks as unused a segment of a datasource. Returns a JSON object of the form `{"segmentStateChanged": <boolean>}` with
the boolean indicating if the state of the segment has been changed (that is, the segment was marked as unused) as the
result of this API call.
#### Retention Rules
Note that all _interval_ URL parameters are ISO 8601 strings delimited by a `_` instead of a `/`
(e.g., 2016-06-27_2016-06-28).
##### GET
* `/druid/coordinator/v1/rules`
Returns all rules as JSON objects for all datasources in the cluster including the default datasource.
* `/druid/coordinator/v1/rules/{dataSourceName}`
Returns all rules for a specified datasource.
* `/druid/coordinator/v1/rules/{dataSourceName}?full`
Returns all rules for a specified datasource and includes default datasource.
* `/druid/coordinator/v1/rules/history?interval=<interval>`
Returns audit history of rules for all datasources. default value of interval can be specified by setting `druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Coordinator runtime.properties
* `/druid/coordinator/v1/rules/history?count=<n>`
Returns last <n> entries of audit history of rules for all datasources.
* `/druid/coordinator/v1/rules/{dataSourceName}/history?interval=<interval>`
Returns audit history of rules for a specified datasource. default value of interval can be specified by setting `druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Coordinator runtime.properties
* `/druid/coordinator/v1/rules/{dataSourceName}/history?count=<n>`
Returns last <n> entries of audit history of rules for a specified datasource.
##### POST
* `/druid/coordinator/v1/rules/{dataSourceName}`
POST with a list of rules in JSON form to update rules.
Optional Header Parameters for auditing the config change can also be specified.
|Header Param Name| Description | Default |
|----------|-------------|---------|
|`X-Druid-Author`| author making the config change|""|
|`X-Druid-Comment`| comment describing the change being done|""|
#### Intervals
Note that all _interval_ URL parameters are ISO 8601 strings delimited by a `_` instead of a `/`
(e.g., 2016-06-27_2016-06-28).
##### GET
* `/druid/coordinator/v1/intervals`
Returns all intervals for all datasources with total size and count.
* `/druid/coordinator/v1/intervals/{interval}`
Returns aggregated total size and count for all intervals that intersect given isointerval.
* `/druid/coordinator/v1/intervals/{interval}?simple`
Returns total size and count for each interval within given isointerval.
* `/druid/coordinator/v1/intervals/{interval}?full`
Returns total size and count for each datasource for each interval within given isointerval.
#### Dynamic configuration
See [Coordinator Dynamic Configuration](../configuration/index.md#dynamic-configuration) for details.
Note that all _interval_ URL parameters are ISO 8601 strings delimited by a `_` instead of a `/`
(e.g., 2016-06-27_2016-06-28).
##### GET
* `/druid/coordinator/v1/config`
Retrieves current coordinator dynamic configuration.
* `/druid/coordinator/v1/config/history?interval={interval}&count={count}`
Retrieves history of changes to overlord dynamic configuration. Accepts `interval` and `count` query string parameters
to filter by interval and limit the number of results respectively.
##### POST
* `/druid/coordinator/v1/config`
Update overlord dynamic worker configuration.
#### Compaction Status
##### GET
* `/druid/coordinator/v1/compaction/progress?dataSource={dataSource}`
Returns the total size of segments awaiting compaction for the given dataSource.
This is only valid for dataSource which has compaction enabled.
##### GET
* `/druid/coordinator/v1/compaction/status`
Returns the status and statistics from the latest auto compaction run of all dataSources which have/had auto compaction enabled.
The response payload includes a list of `latestStatus` objects. Each `latestStatus` represents the status for a dataSource (which has/had auto compaction enabled).
The `latestStatus` object has the following keys:
* `dataSource`: name of the datasource for this status information
* `scheduleStatus`: auto compaction scheduling status. Possible values are `NOT_ENABLED` and `RUNNING`. Returns `RUNNING ` if the dataSource has an active auto compaction config submitted otherwise, `NOT_ENABLED`
* `bytesAwaitingCompaction`: total bytes of this datasource waiting to be compacted by the auto compaction (only consider intervals/segments that are eligible for auto compaction)
* `bytesCompacted`: total bytes of this datasource that are already compacted with the spec set in the auto compaction config.
* `bytesSkipped`: total bytes of this datasource that are skipped (not eligible for auto compaction) by the auto compaction.
* `segmentCountAwaitingCompaction`: total number of segments of this datasource waiting to be compacted by the auto compaction (only consider intervals/segments that are eligible for auto compaction)
* `segmentCountCompacted`: total number of segments of this datasource that are already compacted with the spec set in the auto compaction config.
* `segmentCountSkipped`: total number of segments of this datasource that are skipped (not eligible for auto compaction) by the auto compaction.
* `intervalCountAwaitingCompaction`: total number of intervals of this datasource waiting to be compacted by the auto compaction (only consider intervals/segments that are eligible for auto compaction)
* `intervalCountCompacted`: total number of intervals of this datasource that are already compacted with the spec set in the auto compaction config.
* `intervalCountSkipped`: total number of intervals of this datasource that are skipped (not eligible for auto compaction) by the auto compaction.
##### GET
* `/druid/coordinator/v1/compaction/status?dataSource={dataSource}`
Similar to the API `/druid/coordinator/v1/compaction/status` above but filters response to only return information for the {dataSource} given.
Note that {dataSource} given must have/had auto compaction enabled.
#### Compaction Configuration
##### GET
* `/druid/coordinator/v1/config/compaction`
Returns all compaction configs.
* `/druid/coordinator/v1/config/compaction/{dataSource}`
Returns a compaction config of a dataSource.
##### POST
* `/druid/coordinator/v1/config/compaction/taskslots?ratio={someRatio}&max={someMaxSlots}`
Update the capacity for compaction tasks. `ratio` and `max` are used to limit the max number of compaction tasks.
They mean the ratio of the total task slots to the compaction task slots and the maximum number of task slots for compaction tasks, respectively.
The actual max number of compaction tasks is `min(max, ratio * total task slots)`.
Note that `ratio` and `max` are optional and can be omitted. If they are omitted, default values (0.1 and unbounded)
will be set for them.
* `/druid/coordinator/v1/config/compaction`
Creates or updates the compaction config for a dataSource.
See [Compaction Configuration](../configuration/index.md#compaction-dynamic-configuration) for configuration details.
##### DELETE
* `/druid/coordinator/v1/config/compaction/{dataSource}`
Removes the compaction config for a dataSource.
#### Server information
##### GET
* `/druid/coordinator/v1/servers`
Returns a list of servers URLs using the format `{hostname}:{port}`. Note that
processes that run with different types will appear multiple times with different
ports.
* `/druid/coordinator/v1/servers?simple`
Returns a list of server data objects in which each object has the following keys:
* `host`: host URL include (`{hostname}:{port}`)
* `type`: process type (`indexer-executor`, `historical`)
* `currSize`: storage size currently used
* `maxSize`: maximum storage size
* `priority`
* `tier`
### Overlord
#### Leadership
##### GET
* `/druid/indexer/v1/leader`
Returns the current leader Overlord of the cluster. If you have multiple Overlords, just one is leading at any given time. The others are on standby.
* `/druid/indexer/v1/isLeader`
This returns a JSON object with field "leader", either true or false. In addition, this call returns HTTP 200 if the
server is the current leader and HTTP 404 if not. This is suitable for use as a load balancer status check if you
only want the active leader to be considered in-service at the load balancer.
#### Tasks
Note that all _interval_ URL parameters are ISO 8601 strings delimited by a `_` instead of a `/`
(e.g., 2016-06-27_2016-06-28).
##### GET
* `/druid/indexer/v1/tasks`
Retrieve list of tasks. Accepts query string parameters `state`, `datasource`, `createdTimeInterval`, `max`, and `type`.
|Query Parameter |Description |
|---|---|
|`state`|filter list of tasks by task state, valid options are `running`, `complete`, `waiting`, and `pending`.|
| `datasource`| return tasks filtered by Druid datasource.|
| `createdTimeInterval`| return tasks created within the specified interval. |
| `max`| maximum number of `"complete"` tasks to return. Only applies when `state` is set to `"complete"`.|
| `type`| filter tasks by task type. See [task documentation](../ingestion/tasks.md) for more details.|
* `/druid/indexer/v1/completeTasks`
Retrieve list of complete tasks. Equivalent to `/druid/indexer/v1/tasks?state=complete`.
* `/druid/indexer/v1/runningTasks`
Retrieve list of running tasks. Equivalent to `/druid/indexer/v1/tasks?state=running`.
* `/druid/indexer/v1/waitingTasks`
Retrieve list of waiting tasks. Equivalent to `/druid/indexer/v1/tasks?state=waiting`.
* `/druid/indexer/v1/pendingTasks`
Retrieve list of pending tasks. Equivalent to `/druid/indexer/v1/tasks?state=pending`.
* `/druid/indexer/v1/task/{taskId}`
Retrieve the 'payload' of a task.
* `/druid/indexer/v1/task/{taskId}/status`
Retrieve the status of a task.
* `/druid/indexer/v1/task/{taskId}/segments`
Retrieve information about the segments of a task.
> This API is deprecated and will be removed in future releases.
* `/druid/indexer/v1/task/{taskId}/reports`
Retrieve a [task completion report](../ingestion/tasks.md#task-reports) for a task. Only works for completed tasks.
##### POST
* `/druid/indexer/v1/task`
Endpoint for submitting tasks and supervisor specs to the Overlord. Returns the taskId of the submitted task.
* `/druid/indexer/v1/task/{taskId}/shutdown`
Shuts down a task.
* `/druid/indexer/v1/datasources/{dataSource}/shutdownAllTasks`
Shuts down all tasks for a dataSource.
* `/druid/indexer/v1/taskStatus`
Retrieve list of task status objects for list of task id strings in request body.
##### DELETE
* `/druid/indexer/v1/pendingSegments/{dataSource}`
Manually clean up pending segments table in metadata storage for `datasource`. Returns a JSON object response with
`numDeleted` and count of rows deleted from the pending segments table. This API is used by the
`druid.coordinator.kill.pendingSegments.on` [coordinator setting](../configuration/index.md#coordinator-operation)
which automates this operation to perform periodically.
#### Supervisors
##### GET
* `/druid/indexer/v1/supervisor`
Returns a list of strings of the currently active supervisor ids.
* `/druid/indexer/v1/supervisor?full`
Returns a list of objects of the currently active supervisors.
|Field|Type|Description|
|---|---|---|
|`id`|String|supervisor unique identifier|
|`state`|String|basic state of the supervisor. Available states:`UNHEALTHY_SUPERVISOR`, `UNHEALTHY_TASKS`, `PENDING`, `RUNNING`, `SUSPENDED`, `STOPPING`. Check [Kafka Docs](../development/extensions-core/kafka-ingestion.md#operations) for details.|
|`detailedState`|String|supervisor specific state. (See documentation of specific supervisor for details), e.g. [Kafka](../development/extensions-core/kafka-ingestion.md) or [Kinesis](../development/extensions-core/kinesis-ingestion.md))|
|`healthy`|Boolean|true or false indicator of overall supervisor health|
|`spec`|SupervisorSpec|json specification of supervisor (See Supervisor Configuration for details)|
* `/druid/indexer/v1/supervisor?state=true`
Returns a list of objects of the currently active supervisors and their current state.
|Field|Type|Description|
|---|---|---|
|`id`|String|supervisor unique identifier|
|`state`|String|basic state of the supervisor. Available states: `UNHEALTHY_SUPERVISOR`, `UNHEALTHY_TASKS`, `PENDING`, `RUNNING`, `SUSPENDED`, `STOPPING`. Check [Kafka Docs](../development/extensions-core/kafka-ingestion.md#operations) for details.|
|`detailedState`|String|supervisor specific state. (See documentation of the specific supervisor for details, e.g. [Kafka](../development/extensions-core/kafka-ingestion.md) or [Kinesis](../development/extensions-core/kinesis-ingestion.md))|
|`healthy`|Boolean|true or false indicator of overall supervisor health|
|`suspended`|Boolean|true or false indicator of whether the supervisor is in suspended state|
* `/druid/indexer/v1/supervisor/<supervisorId>`
Returns the current spec for the supervisor with the provided ID.
* `/druid/indexer/v1/supervisor/<supervisorId>/status`
Returns the current status of the supervisor with the provided ID.
* `/druid/indexer/v1/supervisor/history`
Returns an audit history of specs for all supervisors (current and past).
* `/druid/indexer/v1/supervisor/<supervisorId>/history`
Returns an audit history of specs for the supervisor with the provided ID.
##### POST
* `/druid/indexer/v1/supervisor`
Create a new supervisor or update an existing one.
* `/druid/indexer/v1/supervisor/<supervisorId>/suspend`
Suspend the current running supervisor of the provided ID. Responds with updated SupervisorSpec.
* `/druid/indexer/v1/supervisor/suspendAll`
Suspend all supervisors at once.
* `/druid/indexer/v1/supervisor/<supervisorId>/resume`
Resume indexing tasks for a supervisor. Responds with updated SupervisorSpec.
* `/druid/indexer/v1/supervisor/resumeAll`
Resume all supervisors at once.
* `/druid/indexer/v1/supervisor/<supervisorId>/reset`
Reset the specified supervisor.
* `/druid/indexer/v1/supervisor/<supervisorId>/terminate`
Terminate a supervisor of the provided ID.
* `/druid/indexer/v1/supervisor/terminateAll`
Terminate all supervisors at once.
* `/druid/indexer/v1/supervisor/<supervisorId>/shutdown`
Shutdown a supervisor.
> This API is deprecated and will be removed in future releases.
> Please use the equivalent 'terminate' instead.
#### Dynamic configuration
See [Overlord Dynamic Configuration](../configuration/index.md#overlord-dynamic-configuration) for details.
Note that all _interval_ URL parameters are ISO 8601 strings delimited by a `_` instead of a `/`
(e.g., 2016-06-27_2016-06-28).
##### GET
* `/druid/indexer/v1/worker`
Retrieves current overlord dynamic configuration.
* `/druid/indexer/v1/worker/history?interval={interval}&count={count}`
Retrieves history of changes to overlord dynamic configuration. Accepts `interval` and `count` query string parameters
to filter by interval and limit the number of results respectively.
* `/druid/indexer/v1/workers`
Retrieves a list of all the worker nodes in the cluster along with its metadata.
* `/druid/indexer/v1/scaling`
Retrieves overlord scaling events if auto-scaling runners are in use.
##### POST
* /druid/indexer/v1/worker
Update overlord dynamic worker configuration.
## Data Server
This section documents the API endpoints for the processes that reside on Data servers (MiddleManagers/Peons and Historicals)
in the suggested [three-server configuration](../design/processes.md#server-types).
### MiddleManager
##### GET
* `/druid/worker/v1/enabled`
Check whether a MiddleManager is in an enabled or disabled state. Returns JSON object keyed by the combined `druid.host`
and `druid.port` with the boolean state as the value.
```json
{"localhost:8091":true}
```
* `/druid/worker/v1/tasks`
Retrieve a list of active tasks being run on MiddleManager. Returns JSON list of taskid strings. Normal usage should
prefer to use the `/druid/indexer/v1/tasks` [Overlord API](#overlord) or one of it's task state specific variants instead.
```json
["index_wikiticker_2019-02-11T02:20:15.316Z"]
```
* `/druid/worker/v1/task/{taskid}/log`
Retrieve task log output stream by task id. Normal usage should prefer to use the `/druid/indexer/v1/task/{taskId}/log`
[Overlord API](#overlord) instead.
##### POST
* `/druid/worker/v1/disable`
'Disable' a MiddleManager, causing it to stop accepting new tasks but complete all existing tasks. Returns JSON object
keyed by the combined `druid.host` and `druid.port`:
```json
{"localhost:8091":"disabled"}
```
* `/druid/worker/v1/enable`
'Enable' a MiddleManager, allowing it to accept new tasks again if it was previously disabled. Returns JSON object
keyed by the combined `druid.host` and `druid.port`:
```json
{"localhost:8091":"enabled"}
```
* `/druid/worker/v1/task/{taskid}/shutdown`
Shutdown a running task by `taskid`. Normal usage should prefer to use the `/druid/indexer/v1/task/{taskId}/shutdown`
[Overlord API](#overlord) instead. Returns JSON:
```json
{"task":"index_kafka_wikiticker_f7011f8ffba384b_fpeclode"}
```
### Peon
#### GET
* `/druid/worker/v1/chat/{taskId}/rowStats`
Retrieve a live row stats report from a Peon. See [task reports](../ingestion/tasks.md#task-reports) for more details.
* `/druid/worker/v1/chat/{taskId}/unparseableEvents`
Retrieve an unparseable events report from a Peon. See [task reports](../ingestion/tasks.md#task-reports) for more details.
### Historical
#### Segment Loading
##### GET
* `/druid/historical/v1/loadstatus`
Returns JSON of the form `{"cacheInitialized":<value>}`, where value is either `true` or `false` indicating if all
segments in the local cache have been loaded. This can be used to know when a Historical process is ready
to be queried after a restart.
* `/druid/historical/v1/readiness`
Similar to `/druid/historical/v1/loadstatus`, but instead of returning JSON with a flag, responses 200 OK if segments
in the local cache have been loaded, and 503 SERVICE UNAVAILABLE, if they haven't.
## Query Server
This section documents the API endpoints for the processes that reside on Query servers (Brokers) in the suggested [three-server configuration](../design/processes.md#server-types).
### Broker
#### Datasource Information
Note that all _interval_ URL parameters are ISO 8601 strings delimited by a `_` instead of a `/`
(e.g., 2016-06-27_2016-06-28).
##### GET
* `/druid/v2/datasources`
Returns a list of queryable datasources.
* `/druid/v2/datasources/{dataSourceName}`
Returns the dimensions and metrics of the datasource. Optionally, you can provide request parameter "full" to get list of served intervals with dimensions and metrics being served for those intervals. You can also provide request param "interval" explicitly to refer to a particular interval.
If no interval is specified, a default interval spanning a configurable period before the current time will be used. The default duration of this interval is specified in ISO 8601 duration format via:
druid.query.segmentMetadata.defaultHistory
* `/druid/v2/datasources/{dataSourceName}/dimensions`
Returns the dimensions of the datasource.
> This API is deprecated and will be removed in future releases. Please use [SegmentMetadataQuery](../querying/segmentmetadataquery.md) instead
> which provides more comprehensive information and supports all dataSource types including streaming dataSources. It's also encouraged to use [INFORMATION_SCHEMA tables](../querying/sql.md#metadata-tables)
> if you're using SQL.
* `/druid/v2/datasources/{dataSourceName}/metrics`
Returns the metrics of the datasource.
> This API is deprecated and will be removed in future releases. Please use [SegmentMetadataQuery](../querying/segmentmetadataquery.md) instead
> which provides more comprehensive information and supports all dataSource types including streaming dataSources. It's also encouraged to use [INFORMATION_SCHEMA tables](../querying/sql.md#metadata-tables)
> if you're using SQL.
* `/druid/v2/datasources/{dataSourceName}/candidates?intervals={comma-separated-intervals}&numCandidates={numCandidates}`
Returns segment information lists including server locations for the given datasource and intervals. If "numCandidates" is not specified, it will return all servers for each interval.
#### Load Status
##### GET
* `/druid/broker/v1/loadstatus`
Returns a flag indicating if the Broker knows about all segments in the cluster. This can be used to know when a Broker process is ready to be queried after a restart.
* `/druid/broker/v1/readiness`
Similar to `/druid/broker/v1/loadstatus`, but instead of returning a JSON, responses 200 OK if its ready and otherwise 503 SERVICE UNAVAILABLE.
#### Queries
##### POST
* `/druid/v2/`
The endpoint for submitting queries. Accepts an option `?pretty` that pretty prints the results.
* `/druid/v2/candidates/`
Returns segment information lists including server locations for the given query..
### Router
#### GET
* `/druid/v2/datasources`
Returns a list of queryable datasources.
* `/druid/v2/datasources/{dataSourceName}`
Returns the dimensions and metrics of the datasource.
* `/druid/v2/datasources/{dataSourceName}/dimensions`
Returns the dimensions of the datasource.
* `/druid/v2/datasources/{dataSourceName}/metrics`
Returns the metrics of the datasource.

13
operations/api.md

@ -1,13 +0,0 @@
<!-- toc -->
### 通用
### Master
#### Coordinator
#### Overlord
##### Supervisor
### Data
#### MiddleManager
#### Peon
#### Historical
### Query
#### Broker
#### Router

203
operations/auth-ldap.md

@ -0,0 +1,203 @@
---
id: auth-ldap
title: "LDAP auth"
---
<!--
~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
~ distributed with this work for additional information
~ regarding copyright ownership. The ASF licenses this file
~ to you under the Apache License, Version 2.0 (the
~ "License"); you may not use this file except in compliance
~ with the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing,
~ software distributed under the License is distributed on an
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
~ KIND, either express or implied. See the License for the
~ specific language governing permissions and limitations
~ under the License.
-->
This page describes how to set up Druid user authentication and authorization through LDAP. The first step is to enable LDAP authentication and authorization for Druid. You then map an LDAP group to roles and assign permissions to roles.
## Enable LDAP in Druid
Before starting, verify that the active directory is reachable from the Druid Master servers. Command line tools such as `ldapsearch` and `ldapwhoami`, which are included with OpenLDAP, are useful for this testing. 
### Check the connection
First test that the basic connection and user credential works. For example, given a user `uuser1@example.com`, try:
```bash
ldapwhoami -vv -H ldap://<ip_address>:389  -D"uuser1@example.com" -W
```
Enter the password associated with the user when prompted and verify that the command succeeded. If it didn't, try the following troubleshooting steps:
* Verify that you've used the correct port for your LDAP instance. By default, the LDAP port is 389, but double-check with your LDAP admin if unable to connect.
* Check whether a network firewall is not preventing connections to the LDAP port.
* Check whether LDAP clients need to be specifically whitelisted at the LDAP server to be able to reach it. If so, add the Druid Coordinator server to the AD whitelist.
### Check the search criteria
After verifying basic connectivity, check your search criteria. For example, the command for searching for user `uuser1@example.com ` is as follows:
```bash
ldapsearch -x -W -H ldap://<ldap_server>  -D"uuser1@example.com" -b "dc=example,dc=com" "(sAMAccountName=uuser1)"
```
Note the `memberOf` attribute in the results; it shows the groups that the user belongs to. You will use this value to map the LDAP group to the Druid roles later. This attribute may be implemented differently on different types of LDAP servers. For instance, some LDAP servers may support recursive groupings, and some may not. Some LDAP server implementations may not have any object classes that contain this attribute altogether. If your LDAP server does not use the `memberOf` attribute, then Druid will not be able to determine a user's group membership using LDAP. The sAMAccountName attribute used in this example contains the authenticated user identity. This is an attribute of an object class specific to Microsoft Active Directory. The object classes and attribute used in your LDAP server may be different.
## Configure Druid user authentication with LDAP/Active Directory 
1. Enable the `druid-basic-security` extension in the `common.runtime.properties` file. See [Security Overview](security-overview.md) for details.
2. As a best practice, create a user in LDAP to be used for internal communication with Druid.
3. In `common.runtime.properties`, update LDAP-related properties, as shown in the following listing: 
```
druid.auth.authenticatorChain=["ldap"]
druid.auth.authenticator.ldap.type=basic
druid.auth.authenticator.ldap.enableCacheNotifications=true
druid.auth.authenticator.ldap.credentialsValidator.type=ldap
druid.auth.authenticator.ldap.credentialsValidator.url=ldap://<AD host>:<AD port>
druid.auth.authenticator.ldap.credentialsValidator.bindUser=<AD admin user, e.g.: Administrator@example.com>
druid.auth.authenticator.ldap.credentialsValidator.bindPassword=<AD admin password>
druid.auth.authenticator.ldap.credentialsValidator.baseDn=<base dn, e.g.: dc=example,dc=com>
druid.auth.authenticator.ldap.credentialsValidator.userSearch=<The LDAP search, e.g.: (&(sAMAccountName=%s)(objectClass=user))>
druid.auth.authenticator.ldap.credentialsValidator.userAttribute=sAMAccountName
druid.auth.authenticator.ldap.authorizerName=ldapauth
druid.escalator.type=basic
druid.escalator.internalClientUsername=<AD internal user, e.g.: internal@example.com>
druid.escalator.internalClientPassword=Welcome123
druid.escalator.authorizerName=ldapauth
druid.auth.authorizers=["ldapauth"]
druid.auth.authorizer.ldapauth.type=basic
druid.auth.authorizer.ldapauth.initialAdminUser=AD user who acts as the initial admin user, e.g.: internal@example.com>
druid.auth.authorizer.ldapauth.initialAdminRole=admin
druid.auth.authorizer.ldapauth.roleProvider.type=ldap
```
Notice that the LDAP user created in the previous step, `internal@example.com`, serves as the internal client user and the initial admin user.
## Use LDAP groups to assign roles
You can map LDAP groups to a role in Druid. Members in the group get access to the permissions of the corresponding role. 
### Step 1: Create a role
First create the role in Druid using the Druid REST API.
Creating a role involves submitting a POST request to the Coordinator process. 
The following REST APIs to create the role to read access for datasource, config, state.
> As mentioned, the REST API calls need to address the Coordinator node. The examples used below use localhost as the Coordinator host and 8081 as the port. Adjust these settings according to your deployment.
Call the following API to create role `readRole` . 
```
curl -i -v  -H "Content-Type: application/json" -u internal -X POST http://localhost:8081/druid-ext/basic-security/authorization/db/ldapauth/roles/readRole
```
Check that the role has been created successfully by entering the following:
```
curl -i -v  -H "Content-Type: application/json" -u internal -X GET  http://localhost:8081/druid-ext/basic-security/authorization/db/ldapauth/roles
```
### Step 2: Add permissions to a role 
You can now add one or more permission to the role. The following example adds read-only access to a `wikipedia` data source.  
Given the following JSON in a file named `perm.json`:
```
[{ "resource": { "name": "wikipedia", "type": "DATASOURCE" }, "action": "READ" }
,{ "resource": { "name": ".*", "type": "STATE" }, "action": "READ" },
{ "resource": {"name": ".*", "type": "CONFIG"}, "action": "READ"}]
```
The following command associates the permissions in the JSON file with the role
```
curl -i -v  -H "Content-Type: application/json" -u internal -X POST -d@perm.json  http://localhost:8081/druid-ext/basic-security/authorization/db/ldapauth/roles/readRole/permissions
```
Note that the STATE and CONFIG permissions in `perm.json` are needed to see the data source in the Druid console. If only querying permissions are needed, the READ action is sufficient:
```
[{ "resource": { "name": "wikipedia", "type": "DATASOURCE" }, "action": "READ" }]
```
You can also provide the name in the form of regular expression. For example, to give access to all data sources starting with `wiki`, specify the name as  `{ "name": "wiki.*", .....`.
### Step 3: Create group Mapping 
The following shows an example of a group to role mapping. It assumes that a group named `group1` exists in the directory. Also assuming the following role mapping in a file named `groupmap.json`:
```
{
    "name": "group1map",
    "groupPattern": "CN=group1,CN=Users,DC=example,DC=com",
    "roles": [
        "readRole"
    ]
}
```
You can configure the mapping as follows:
```
curl -i -v  -H "Content-Type: application/json" -u internal -X POST -d @groupmap.json http://localhost:8081/druid-ext/basic-security/authorization/db/ldapauth/groupMappings/group1map
```
To check whether the group mapping was created successfully, run the following command:
```
curl -i -v  -H "Content-Type: application/json" -u internal -X GET http://localhost:8081/druid-ext/basic-security/authorization/db/ldapauth/groupMappings
```
To check the details of a specific group mapping, use the following:
```
curl -i -v  -H "Content-Type: application/json" -u internal -X GET http://localhost:8081/druid-ext/basic-security/authorization/db/ldapauth/groupMappings/group1map
```
To add additional roles to the group mapping, use the following API:
```
curl -i -v  -H "Content-Type: application/json" -u internal -X POST http://localhost:8081/druid-ext/basic-security/authorization/db/ldapauth/groupMappings/group1/roles/<newrole> 
```
In the next two steps you will be creating a user, and assigning previously created roles to it. These steps are only needed in the following cases:
- Your LDAP server does not support the `memberOf` attribute, or
- You want to configure a user with additional roles that are not mapped to the group(s) that the user is a member of
If this is not the case for your scenario, you can skip these steps.
### Step 4. Create a user
Once LDAP is enabled, only user passwords are verified with LDAP. You add the LDAP user to Druid as follows: 
```
curl -i -v  -H "Content-Type: application/json" -u internal -X POST http://localhost:8081/druid-ext/basic-security/authentication/db/ldap/users/<AD user> 
```
### Step 5. Assign the role to the user
The following command shows how to assign a role to a user:
```
curl -i -v  -H "Content-Type: application/json" -u internal -X POST http://localhost:8081/druid-ext/basic-security/authorization/db/ldapauth/users/<AD user>/roles/<