Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use text prefix in regex to speed up query #4776

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

rnewson
Copy link
Member

@rnewson rnewson commented Sep 26, 2023

Overview

for selector;

{"selector":{"_id":{"$regex":"doc.+"}}}

before;

{
  "include_docs": true,
  "view_type": "map",
  "reduce": false,
  "partition": null,
  "start_key": [],
  "end_key": [
    "<MAX>"
  ],
  "direction": "fwd",
  "stable": false,
  "update": true,
  "conflicts": "undefined"
}

after;

{
  "include_docs": true,
  "view_type": "map",
  "reduce": false,
  "partition": null,
  "start_key": [
    "doc"
  ],
  "end_key": [
    "doc�",
    "<MAX>"
  ],
  "direction": "fwd",
  "stable": false,
  "update": true,
  "conflicts": "undefined"
}

Testing recommendations

TBD

Related Issues or Pull Requests

#4775

Checklist

  • Code is written and works correctly
  • Changes are covered by tests
  • Any new configurable parameters are documented in rel/overlay/etc/default.ini
  • Documentation changes were made in the src/docs folder
  • Documentation changes were backported (separated PR) to affected branches

@rnewson
Copy link
Member Author

rnewson commented Sep 26, 2023

I've not done the text side yet, or any tests. just sounding out the idea.

for text I'd prefer to pass the regex through to Lucene (clouseau or nouveau) and document the variation in regex flavour (there's huge overlap). Omitting the optimizations Lucene makes for the sake of purity was a mistake in the original implementation imo.

for selector;

{"selector":{"_id":{"$regex":"doc.+"}}}

before;

{
  "include_docs": true,
  "view_type": "map",
  "reduce": false,
  "partition": null,
  "start_key": [],
  "end_key": [
    "<MAX>"
  ],
  "direction": "fwd",
  "stable": false,
  "update": true,
  "conflicts": "undefined"
}

after;

{
  "include_docs": true,
  "view_type": "map",
  "reduce": false,
  "partition": null,
  "start_key": [
    "doc"
  ],
  "end_key": [
    "doc�",
    "<MAX>"
  ],
  "direction": "fwd",
  "stable": false,
  "update": true,
  "conflicts": "undefined"
}

closes: #4775
@willholley
Copy link
Member

willholley commented Sep 26, 2023

I wonder whether a $startsWith operator would be cleaner, as we could then optimize it for text indexes specifically? The $regex operator originally did differ for text indexes iirc but we had users experience weirdness when they would add an index and suddenly get different results.

A general principal in Mango over the last ~5 years is that adding an index shouldn't change the result of a query implicitly, so I'd be wary of reintroducing that behaviour.

@rnewson
Copy link
Member Author

rnewson commented Sep 26, 2023

I like that. We could then convert the prefix of the regex to a startswith for views.

willholley added a commit that referenced this pull request Oct 17, 2023
Adds a `$beginsWith` operator to selectors, with json and text index
support. This is a compliment / precursor to optimising `$regex`
support as proposed in #4776.

For `json` indexes, a $beginsWith operator translates into a key
range query, as is common practice for _view queries. For example,
to find all rows with a key beginning with "W", we can use a range
`start_key="W", end_key="W\ufff0"`. Given Mango uses compound keys,
this is slightly more complex in practice, but the idea is the same.
As with other range operators (`$gt`, `$gte`, etc), `$beginsWith`
can be used in combination with equality operators and result sorting
but must result in a contiguous key range. That is, a range of
`start_key=[10, "W"], end_key=[10, "W\ufff0", {}]` would be valid,
but `start_key=["W", 10], end_key=["W\ufff0", 10, {}]` would not,
because the second element of the key may result in a non-contiguous
range.

For text indexes, `$beginsWith` translates to a Lucene query on
the specified field of `W*`.

If a non-string operand is provided to `$beginsWith`, the request will
fail with a 400 / `invalid_operator` error.
@willholley willholley mentioned this pull request Oct 17, 2023
5 tasks
willholley added a commit that referenced this pull request Oct 17, 2023
Adds a `$beginsWith` operator to selectors, with json and text index
support. This is a compliment / precursor to optimising `$regex`
support as proposed in #4776.

For `json` indexes, a $beginsWith operator translates into a key
range query, as is common practice for _view queries. For example,
to find all rows with a key beginning with "W", we can use a range
`start_key="W", end_key="W\ufff0"`. Given Mango uses compound keys,
this is slightly more complex in practice, but the idea is the same.
As with other range operators (`$gt`, `$gte`, etc), `$beginsWith`
can be used in combination with equality operators and result sorting
but must result in a contiguous key range. That is, a range of
`start_key=[10, "W"], end_key=[10, "W\ufff0", {}]` would be valid,
but `start_key=["W", 10], end_key=["W\ufff0", 10, {}]` would not,
because the second element of the key may result in a non-contiguous
range.

For text indexes, `$beginsWith` translates to a Lucene query on
the specified field of `W*`.

If a non-string operand is provided to `$beginsWith`, the request will
fail with a 400 / `invalid_operator` error.
willholley added a commit that referenced this pull request Oct 26, 2023
Adds a `$beginsWith` operator to selectors, with json and text index
support. This is a compliment / precursor to optimising `$regex`
support as proposed in #4776.

For `json` indexes, a $beginsWith operator translates into a key
range query, as is common practice for _view queries. For example,
to find all rows with a key beginning with "W", we can use a range
`start_key="W", end_key="W\ufff0"`. Given Mango uses compound keys,
this is slightly more complex in practice, but the idea is the same.
As with other range operators (`$gt`, `$gte`, etc), `$beginsWith`
can be used in combination with equality operators and result sorting
but must result in a contiguous key range. That is, a range of
`start_key=[10, "W"], end_key=[10, "W\ufff0", {}]` would be valid,
but `start_key=["W", 10], end_key=["W\ufff0", 10, {}]` would not,
because the second element of the key may result in a non-contiguous
range.

For text indexes, `$beginsWith` translates to a Lucene query on
the specified field of `W*`.

If a non-string operand is provided to `$beginsWith`, the request will
fail with a 400 / `invalid_operator` error.
willholley added a commit that referenced this pull request Oct 30, 2023
Adds a `$beginsWith` operator to selectors, with json and text index
support. This is a compliment / precursor to optimising `$regex`
support as proposed in #4776.

For `json` indexes, a $beginsWith operator translates into a key
range query, as is common practice for _view queries. For example,
to find all rows with a key beginning with "W", we can use a range
`start_key="W", end_key="W\ufff0"`. Given Mango uses compound keys,
this is slightly more complex in practice, but the idea is the same.
As with other range operators (`$gt`, `$gte`, etc), `$beginsWith`
can be used in combination with equality operators and result sorting
but must result in a contiguous key range. That is, a range of
`start_key=[10, "W"], end_key=[10, "W\ufff0", {}]` would be valid,
but `start_key=["W", 10], end_key=["W\ufff0", 10, {}]` would not,
because the second element of the key may result in a non-contiguous
range.

For text indexes, `$beginsWith` translates to a Lucene query on
the specified field of `W*`.

If a non-string operand is provided to `$beginsWith`, the request will
fail with a 400 / `invalid_operator` error.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants