Skip to content

Bindings for user-defined metrics#734

Open
p-senichenkov wants to merge 8 commits into
Desbordante:mainfrom
p-senichenkov:bindings/typed-custom-metrics
Open

Bindings for user-defined metrics#734
p-senichenkov wants to merge 8 commits into
Desbordante:mainfrom
p-senichenkov:bindings/typed-custom-metrics

Conversation

@p-senichenkov

@p-senichenkov p-senichenkov commented Apr 25, 2026

Copy link
Copy Markdown
Collaborator

Add classes for user-defined metrics that allow transparent usage from Python (i. e. user's function gets values directly, not strings or something), without introducing extra overhead in C++.

There are two types of metrics:

  • A single-column metric (and a separate option for a set of single-column metrics)
  • A multi-column metric, that treats its arguments as vectors

There are three classes provided by default:

  • Py*Metric -- allows the user to pass a Python function, which will be called on real values
  • Static*Metric -- C++ metric that converts all values to some fixed type (very convenient in tests!)
  • Dynamic*Metric -- C++ metric that passes real column type and raw value to the user function, allowing it to do all conversions
    Other types of metrics can be introduced by implementing the IMetric interface, if some non-standard behavior is required in some algorithm.

Also, there are default metrics for metrizable types (vector metric uses the euclidean distance)

@p-senichenkov p-senichenkov requested a review from wildsor April 25, 2026 10:36
@p-senichenkov p-senichenkov force-pushed the bindings/typed-custom-metrics branch 4 times, most recently from 04dffd9 to d6877bf Compare April 26, 2026 13:25

@wildsor wildsor left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyMetric

Could you please update the description? That doesn't seem to exist anymore.

(very convenient in tests!)

If it's only needed in tests, then it belongs in tests.

some non-standard behavior

Overabstraction has had a pretty bad effect on the code of the project. Please do not implement things unless you're certain (and can explain) that they're needed.

Comment thread src/python_bindings/data/py_custom_metrics.cpp Outdated
@p-senichenkov

p-senichenkov commented Apr 27, 2026

Copy link
Copy Markdown
Collaborator Author

Could you please update the description? That doesn't seem to exist anymore.

The description is up-to-date. I meant Py{Custom,CustomVector}Metric

If it's only needed in tests, then it belongs in tests.

No, it's needed not only in tests

Overabstraction has had a pretty bad effect on the code of the project. Please do not implement things unless you're certain (and can explain) that they're needed.

How am I supposed to implement this thing without any interfaces?😐

@p-senichenkov p-senichenkov requested a review from wildsor April 27, 2026 18:28
@wildsor

wildsor commented Apr 27, 2026

Copy link
Copy Markdown
Collaborator

How am I supposed to implement this thing without any interfaces?😐

nevermind, I misread, all ok

@p-senichenkov p-senichenkov force-pushed the bindings/typed-custom-metrics branch from 995df17 to fee92af Compare April 29, 2026 18:47
std::ranges::transform(
metric_handles, result.begin(),
[](py::object obj) -> std::shared_ptr<config::ICustomMetric> {
if (obj.is_none()) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this check do? Isn't it going to cause segfaults where a Python error would have been raised otherwise?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it isn't going to. Actually, we'd get segfaults in low-level python code if we removed this check, since null is not checked there

nullptr is not an error. It says "use the default metric for this column", just like None does in most of Python standard library functions. It's being replaced with DefaultCustomMetric later in option's normalization function. Passing py::none there would cause a segfault

#include <cstddef>
#include <functional>
#include <optional>
#include <sstream>

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include <sstream>
#include <sstream>
#include <utility>


namespace config {
/// @brief An option that represents a custom metric on a single column in the table
class MetricOption : public Option<std::shared_ptr<ICustomMetric>> {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use config::CommonOption here instead of creating a new class.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CommonOption doesn't provide ability to set name and description, and I really dislike the idea of creating a global object for each possible option name, that will be used only once (like "metric", "lhs_metric", "rhs_metric", "some_other_metric")


namespace config {
/// @brief Option for a collection of user-defined metrics
class MetricsOption : public Option<std::vector<std::shared_ptr<ICustomMetric>>> {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand, the normalization function must be called manually from user's code, and the only correct option for this function is NormalizeMetrics. Also this function does the job of both NormalizeFunc and ValueCheckFunc (it throws an exception if the check is failed). I think that a better interface can be made.

We can do something like in config::IndicesOption:

  • Use CommonOption<ValueType> object as a field
  • Construct it with name, description and default value in constructor
  • Add operator() that takes value_ptr and indices_count as parameters, creates Option object from CommonOption, calls SetNormalizeFunc and SetValueCheck with specified lambdas (with captured indices_count), and returns the resulting Option object
  • Add global variables as constructed MetricsOption objects

The usage would be much simpler:
RegisterOption(kLhsMetricsOpt(&lhs_metrics_, lhs_indices_.size()));.

Comment thread src/core/config/custom_metric/custom_vector_metric.h
namespace config {
/// @brief An option that represents a custom metric on a set of columns, that treats values as a
/// vector
class VectorMetricOption : public Option<std::shared_ptr<ICustomVectorMetric>> {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we can also use CommonOption instead of creating a class.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe util is a better place for custom_metric.h and custom_vector_metric.h because config is responsible only for setting options.

Also other options in config have a common structure: option.cpp, option.h and type.h. Maybe we should follow this structure for consistency.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe util is a better place for custom_metric.h and custom_vector_metric.h because config is responsible only for setting options.

I'm not sure. Custom metric types are related to configuration, like config::IndicesType

Also other options in config have a common structure: option.cpp, option.h and type.h. Maybe we should follow this structure for consistency.

Here we have several types and options related to a single thing. Maybe something like

custom_metirc
|- metric
|  |- type.h
|  `- option.h
|- metrics
|  `- option.h
`- vector_metric.h
   |- type.h
    `- option.h

if (obj.is_none()) {
return nullptr;
}
return std::make_shared<python_bindings::PyCustomMetric>(std::move(obj));

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why didn't we use reinterpret_borrow here like in other functions?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because obj is already py::object here, there is no need in casting it again

Comment thread src/core/config/custom_metric/custom_metric.h Outdated
Comment on lines +17 to +20
pybind11::object GetPyObjectFromMixed(model::Type const* type, std::byte const* value);
pybind11::object GetPyObject(model::Type const* type, std::byte const* value);
std::vector<pybind11::object> GetPyObjects(std::vector<model::Type const*> const& types,
std::vector<std::byte const*> const& values);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, all methods for type conversions are defined in separate files, like py_to_any, get_py_type and opt_to_py. Also these conversions can be used not only for metrics: several primitives (including Domain PACs) are already using Python functions and more may use them in the future. So, here I suggest moving these functions to a separate file in py_util.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand, these conversion functions are the only part of this PR that is used by Domain PACs. Maybe we could implement the conversions in #611 and leave the rest here in order to merge Domain PACs sooner.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that Domain PACs are easier to merge than this PR. Let's just move converters into a separate PR and rebase both PRs onto it

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread src/python_bindings/data/py_custom_metrics.cpp Outdated
@p-senichenkov p-senichenkov force-pushed the bindings/typed-custom-metrics branch from 94f4123 to 5e21d26 Compare June 13, 2026 10:34
@p-senichenkov p-senichenkov force-pushed the bindings/typed-custom-metrics branch from 5e21d26 to a788068 Compare June 13, 2026 12:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants