Get Users With More Than One Account in the Same Source (Update)

We recently assisted a customer with a search api aggregation that can be used to detect sources with multiple accounts. Based on that engagement we’ve updated the DSL query and wanted to provide an example and some clarification.

This aggregation consists of two term buckets which are group counts for each value found in the search aggregation - in this case the source bucket and duplicate accounts bucket.

The main question asked was can the list of accounts be paged if the number of buckets exceeds the max limit. Since this is a search aggregation instead of a query the results cannot be paged but you can increase the size of buckets up to 65536. In the example below the first size parameter is for the source buckets, the second size value is for the duplicate accounts buckets which is most likely what you would need to increase.

There was also some confusion on what the query portion of the search was for. This query determines which search documents (in this case Identities) will be considered in the aggregation, You can page the search results or set limit to 0 if you don’t want to see query results and only want to see aggregation buckets. This will not cause the aggregation results to page.

Updated example:

{
	"query": {
		"query": "*"
	},
	"indices": [
		"identities"
	],
	"aggregationsDsl": {
		"accounts": {
			"nested": {
				"path": "accounts"
			},
			"aggs": {
				"source_name": {
					"terms": {
						"field": "accounts.source.name.exact",
						"min_doc_count": 2,
						"size": 1000                        
					},
					"aggs": {
						"identities": {
							"terms": {
								"field": "_id",
								"min_doc_count": 2,
								"size": 1000,
							},
							"aggs": {
								"accounts": {
									"top_hits": {}
								}
							}
						}
					}
				}
			}
		}
	}
}
1 Like

A post was merged into an existing topic: Get Users With More Than One Account in the Same Source