Bindings for user-defined metrics#734
Conversation
04dffd9 to
d6877bf
Compare
There was a problem hiding this comment.
PyMetric
Could you please update the description? That doesn't seem to exist anymore.
(very convenient in tests!)
If it's only needed in tests, then it belongs in tests.
some non-standard behavior
Overabstraction has had a pretty bad effect on the code of the project. Please do not implement things unless you're certain (and can explain) that they're needed.
The description is up-to-date. I meant
No, it's needed not only in tests
How am I supposed to implement this thing without any interfaces?😐 |
nevermind, I misread, all ok |
995df17 to
fee92af
Compare
| std::ranges::transform( | ||
| metric_handles, result.begin(), | ||
| [](py::object obj) -> std::shared_ptr<config::ICustomMetric> { | ||
| if (obj.is_none()) { |
There was a problem hiding this comment.
What does this check do? Isn't it going to cause segfaults where a Python error would have been raised otherwise?
There was a problem hiding this comment.
No, it isn't going to. Actually, we'd get segfaults in low-level python code if we removed this check, since null is not checked there
nullptr is not an error. It says "use the default metric for this column", just like None does in most of Python standard library functions. It's being replaced with DefaultCustomMetric later in option's normalization function. Passing py::none there would cause a segfault
| #include <cstddef> | ||
| #include <functional> | ||
| #include <optional> | ||
| #include <sstream> |
There was a problem hiding this comment.
| #include <sstream> | |
| #include <sstream> | |
| #include <utility> |
|
|
||
| namespace config { | ||
| /// @brief An option that represents a custom metric on a single column in the table | ||
| class MetricOption : public Option<std::shared_ptr<ICustomMetric>> { |
There was a problem hiding this comment.
We could use config::CommonOption here instead of creating a new class.
There was a problem hiding this comment.
CommonOption doesn't provide ability to set name and description, and I really dislike the idea of creating a global object for each possible option name, that will be used only once (like "metric", "lhs_metric", "rhs_metric", "some_other_metric")
|
|
||
| namespace config { | ||
| /// @brief Option for a collection of user-defined metrics | ||
| class MetricsOption : public Option<std::vector<std::shared_ptr<ICustomMetric>>> { |
There was a problem hiding this comment.
As far as I understand, the normalization function must be called manually from user's code, and the only correct option for this function is NormalizeMetrics. Also this function does the job of both NormalizeFunc and ValueCheckFunc (it throws an exception if the check is failed). I think that a better interface can be made.
We can do something like in config::IndicesOption:
- Use
CommonOption<ValueType>object as a field - Construct it with name, description and default value in constructor
- Add
operator()that takesvalue_ptrandindices_countas parameters, createsOptionobject fromCommonOption, callsSetNormalizeFuncandSetValueCheckwith specified lambdas (with capturedindices_count), and returns the resultingOptionobject - Add global variables as constructed
MetricsOptionobjects
The usage would be much simpler:
RegisterOption(kLhsMetricsOpt(&lhs_metrics_, lhs_indices_.size()));.
| namespace config { | ||
| /// @brief An option that represents a custom metric on a set of columns, that treats values as a | ||
| /// vector | ||
| class VectorMetricOption : public Option<std::shared_ptr<ICustomVectorMetric>> { |
There was a problem hiding this comment.
Here we can also use CommonOption instead of creating a class.
There was a problem hiding this comment.
Maybe util is a better place for custom_metric.h and custom_vector_metric.h because config is responsible only for setting options.
Also other options in config have a common structure: option.cpp, option.h and type.h. Maybe we should follow this structure for consistency.
There was a problem hiding this comment.
Maybe
utilis a better place forcustom_metric.handcustom_vector_metric.hbecauseconfigis responsible only for setting options.
I'm not sure. Custom metric types are related to configuration, like config::IndicesType
Also other options in
confighave a common structure:option.cpp,option.handtype.h. Maybe we should follow this structure for consistency.
Here we have several types and options related to a single thing. Maybe something like
custom_metirc
|- metric
| |- type.h
| `- option.h
|- metrics
| `- option.h
`- vector_metric.h
|- type.h
`- option.h
| if (obj.is_none()) { | ||
| return nullptr; | ||
| } | ||
| return std::make_shared<python_bindings::PyCustomMetric>(std::move(obj)); |
There was a problem hiding this comment.
Why didn't we use reinterpret_borrow here like in other functions?
There was a problem hiding this comment.
Because obj is already py::object here, there is no need in casting it again
| pybind11::object GetPyObjectFromMixed(model::Type const* type, std::byte const* value); | ||
| pybind11::object GetPyObject(model::Type const* type, std::byte const* value); | ||
| std::vector<pybind11::object> GetPyObjects(std::vector<model::Type const*> const& types, | ||
| std::vector<std::byte const*> const& values); |
There was a problem hiding this comment.
Currently, all methods for type conversions are defined in separate files, like py_to_any, get_py_type and opt_to_py. Also these conversions can be used not only for metrics: several primitives (including Domain PACs) are already using Python functions and more may use them in the future. So, here I suggest moving these functions to a separate file in py_util.
There was a problem hiding this comment.
As far as I understand, these conversion functions are the only part of this PR that is used by Domain PACs. Maybe we could implement the conversions in #611 and leave the rest here in order to merge Domain PACs sooner.
There was a problem hiding this comment.
I'm not sure that Domain PACs are easier to merge than this PR. Let's just move converters into a separate PR and rebase both PRs onto it
94f4123 to
5e21d26
Compare
Implement functions to convert value in table to Python value
5e21d26 to
a788068
Compare
Add classes for user-defined metrics that allow transparent usage from Python (i. e. user's function gets values directly, not strings or something), without introducing extra overhead in C++.
There are two types of metrics:
There are three classes provided by default:
Py*Metric-- allows the user to pass a Python function, which will be called on real valuesStatic*Metric-- C++ metric that converts all values to some fixed type (very convenient in tests!)Dynamic*Metric-- C++ metric that passes real column type and raw value to the user function, allowing it to do all conversionsOther types of metrics can be introduced by implementing the
IMetricinterface, if some non-standard behavior is required in some algorithm.Also, there are default metrics for metrizable types (vector metric uses the euclidean distance)