Skip to content

Add CM-SPADE Algorithm for sequential pattern mining#703

Open
lubimoemestechko wants to merge 3 commits into
Desbordante:mainfrom
lubimoemestechko:feature-cmspade
Open

Add CM-SPADE Algorithm for sequential pattern mining#703
lubimoemestechko wants to merge 3 commits into
Desbordante:mainfrom
lubimoemestechko:feature-cmspade

Conversation

@lubimoemestechko

Copy link
Copy Markdown

This pull request adds a new algorithm for vertical mining of sequential patterns. The following functionalities have been developed:

  • the core CM-SPADE algorithm and the structures it uses;
  • a parser for handling sequences where the symbol -1 indicates the start of a new itemset and the symbol -2 indicates the end of a sequence;
  • Python bindings to use the algorithm from Python.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

Comment thread src/core/algorithms/cmspade/cmspade.cpp Outdated
EquivalenceClass& child_y = (i == j) ? child_x : eq_members[j];
Int item_y = child_y.GetClassIdentifier()->GetLastElement().GetItem().GetId();

bool do_not_explore_XY = false, do_not_explore_YX = false;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: invalid case style for variable 'do_not_explore_XY' [readability-identifier-naming]

Suggested change
bool do_not_explore_XY = false, do_not_explore_YX = false;
bool do_not_explore_xy = false, do_not_explore_YX = false;

src/core/algorithms/cmspade/cmspade.cpp:111:

-                         do_not_explore_XY = (count1 < minsup_absolute_);
+                         do_not_explore_xy = (count1 < minsup_absolute_);

src/core/algorithms/cmspade/cmspade.cpp:113:

-                         do_not_explore_XY = true;
+                         do_not_explore_xy = true;

src/core/algorithms/cmspade/cmspade.cpp:116:

-                     do_not_explore_XY = true;
+                     do_not_explore_xy = true;

src/core/algorithms/cmspade/cmspade.cpp:133:

-                 do_not_explore_XY = true;
+                 do_not_explore_xy = true;

src/core/algorithms/cmspade/cmspade.cpp:168:

-             if (do_not_explore_XY && do_not_explore_YX && do_not_explore_X_Y && do_not_explore_Y_X){
+             if (do_not_explore_xy && do_not_explore_YX && do_not_explore_X_Y && do_not_explore_Y_X){

src/core/algorithms/cmspade/cmspade.cpp:173:

-                                                             minsup_absolute_, do_not_explore_XY, do_not_explore_YX, 
+                                                             minsup_absolute_, do_not_explore_xy, do_not_explore_YX, 

Comment thread src/core/algorithms/cmspade/cmspade.cpp Outdated
EquivalenceClass& child_y = (i == j) ? child_x : eq_members[j];
Int item_y = child_y.GetClassIdentifier()->GetLastElement().GetItem().GetId();

bool do_not_explore_XY = false, do_not_explore_YX = false;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: invalid case style for variable 'do_not_explore_YX' [readability-identifier-naming]

Suggested change
bool do_not_explore_XY = false, do_not_explore_YX = false;
bool do_not_explore_XY = false, do_not_explore_yx = false;

src/core/algorithms/cmspade/cmspade.cpp:125:

-                         do_not_explore_YX = (count2 < minsup_absolute_);
+                         do_not_explore_yx = (count2 < minsup_absolute_);

src/core/algorithms/cmspade/cmspade.cpp:127:

-                         do_not_explore_YX = true;
+                         do_not_explore_yx = true;

src/core/algorithms/cmspade/cmspade.cpp:130:

-                     do_not_explore_YX = true;
+                     do_not_explore_yx = true;

src/core/algorithms/cmspade/cmspade.cpp:134:

-                 do_not_explore_YX = true;
+                 do_not_explore_yx = true;

src/core/algorithms/cmspade/cmspade.cpp:168:

-             if (do_not_explore_XY && do_not_explore_YX && do_not_explore_X_Y && do_not_explore_Y_X){
+             if (do_not_explore_XY && do_not_explore_yx && do_not_explore_X_Y && do_not_explore_Y_X){

src/core/algorithms/cmspade/cmspade.cpp:173:

-                                                             minsup_absolute_, do_not_explore_XY, do_not_explore_YX, 
+                                                             minsup_absolute_, do_not_explore_XY, do_not_explore_yx, 

Comment thread src/core/algorithms/cmspade/cmspade.cpp Outdated
Int item_y = child_y.GetClassIdentifier()->GetLastElement().GetItem().GetId();

bool do_not_explore_XY = false, do_not_explore_YX = false;
bool do_not_explore_X_Y = false, do_not_explore_Y_X = false;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: invalid case style for variable 'do_not_explore_X_Y' [readability-identifier-naming]

Suggested change
bool do_not_explore_X_Y = false, do_not_explore_Y_X = false;
bool do_not_explore_x_y = false, do_not_explore_Y_X = false;

src/core/algorithms/cmspade/cmspade.cpp:142:

-                         do_not_explore_X_Y = (count1 < minsup_absolute_);
+                         do_not_explore_x_y = (count1 < minsup_absolute_);

src/core/algorithms/cmspade/cmspade.cpp:144:

-                         do_not_explore_X_Y = true;
+                         do_not_explore_x_y = true;

src/core/algorithms/cmspade/cmspade.cpp:147:

-                     do_not_explore_X_Y = true;
+                     do_not_explore_x_y = true;

src/core/algorithms/cmspade/cmspade.cpp:164:

-                 do_not_explore_X_Y = true;
+                 do_not_explore_x_y = true;

src/core/algorithms/cmspade/cmspade.cpp:168:

-             if (do_not_explore_XY && do_not_explore_YX && do_not_explore_X_Y && do_not_explore_Y_X){
+             if (do_not_explore_XY && do_not_explore_YX && do_not_explore_x_y && do_not_explore_Y_X){

src/core/algorithms/cmspade/cmspade.cpp:174:

-                                                             do_not_explore_X_Y, do_not_explore_Y_X);
+                                                             do_not_explore_x_y, do_not_explore_Y_X);

Comment thread src/core/algorithms/cmspade/cmspade.cpp Outdated
Int item_y = child_y.GetClassIdentifier()->GetLastElement().GetItem().GetId();

bool do_not_explore_XY = false, do_not_explore_YX = false;
bool do_not_explore_X_Y = false, do_not_explore_Y_X = false;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: invalid case style for variable 'do_not_explore_Y_X' [readability-identifier-naming]

Suggested change
bool do_not_explore_X_Y = false, do_not_explore_Y_X = false;
bool do_not_explore_X_Y = false, do_not_explore_y_x = false;

src/core/algorithms/cmspade/cmspade.cpp:156:

-                         do_not_explore_Y_X = (count2 < minsup_absolute_);
+                         do_not_explore_y_x = (count2 < minsup_absolute_);

src/core/algorithms/cmspade/cmspade.cpp:158:

-                         do_not_explore_Y_X = true;
+                         do_not_explore_y_x = true;

src/core/algorithms/cmspade/cmspade.cpp:161:

-                     do_not_explore_Y_X = true;
+                     do_not_explore_y_x = true;

src/core/algorithms/cmspade/cmspade.cpp:165:

-                 do_not_explore_Y_X = true;
+                 do_not_explore_y_x = true;

src/core/algorithms/cmspade/cmspade.cpp:168:

-             if (do_not_explore_XY && do_not_explore_YX && do_not_explore_X_Y && do_not_explore_Y_X){
+             if (do_not_explore_XY && do_not_explore_YX && do_not_explore_X_Y && do_not_explore_y_x){

src/core/algorithms/cmspade/cmspade.cpp:174:

-                                                             do_not_explore_X_Y, do_not_explore_Y_X);
+                                                             do_not_explore_X_Y, do_not_explore_y_x);

const std::vector<Item> &GetItems() const { return items_; }
Item GetItem(int index) const { return items_[index]; }

std::size_t size() const { return items_.size(); }

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: invalid case style for function 'size' [readability-identifier-naming]

Suggested change
std::size_t size() const { return items_.size(); }
std::size_t Size() const { return items_.size(); }

Item GetItem(int index) const { return items_[index]; }

std::size_t size() const { return items_.size(); }
bool empty() const { return items_.empty(); }

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: invalid case style for function 'empty' [readability-identifier-naming]

Suggested change
bool empty() const { return items_.empty(); }
bool Empty() const { return items_.empty(); }


const std::vector<std::unique_ptr<Itemset>>& GetItemsets() const { return itemsets_; }

size_t size() const { return itemsets_.size(); }

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: invalid case style for function 'size' [readability-identifier-naming]

Suggested change
size_t size() const { return itemsets_.size(); }
size_t Size() const { return itemsets_.size(); }

const std::vector<std::unique_ptr<Itemset>>& GetItemsets() const { return itemsets_; }

size_t size() const { return itemsets_.size(); }
int length() const { return number_of_items_; }

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: invalid case style for function 'length' [readability-identifier-naming]

Suggested change
int length() const { return number_of_items_; }
int Length() const { return number_of_items_; }

}
}

if (!current_itemset->empty()) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: Dereference of null smart pointer 'current_itemset' of type 'std::unique_ptr' [clang-analyzer-cplusplus.Move]

    if (!current_itemset->empty()) {
         ^
Additional context

src/core/algorithms/cmspade/parser/cmspade_parser.cpp:22: Calling 'CMSpadeParser::ParseNextSequence'

    while (ParseNextSequence(sequence_id, sequence)) {
           ^

src/core/algorithms/cmspade/parser/cmspade_parser.cpp:35: Loop condition is true. Entering loop body

    while (std::getline(file_, line)) {
    ^

src/core/algorithms/cmspade/parser/cmspade_parser.cpp:36: Assuming the condition is false

        if (line.empty() || line[0] == '#' || line[0] == '%' || line[0] == '@') {
            ^

src/core/algorithms/cmspade/parser/cmspade_parser.cpp:36: Left side of '||' is false

        if (line.empty() || line[0] == '#' || line[0] == '%' || line[0] == '@') {
            ^

src/core/algorithms/cmspade/parser/cmspade_parser.cpp:36: Assuming the condition is false

        if (line.empty() || line[0] == '#' || line[0] == '%' || line[0] == '@') {
                            ^

src/core/algorithms/cmspade/parser/cmspade_parser.cpp:36: Left side of '||' is false

        if (line.empty() || line[0] == '#' || line[0] == '%' || line[0] == '@') {
            ^

src/core/algorithms/cmspade/parser/cmspade_parser.cpp:36: Assuming the condition is false

        if (line.empty() || line[0] == '#' || line[0] == '%' || line[0] == '@') {
                                              ^

src/core/algorithms/cmspade/parser/cmspade_parser.cpp:36: Left side of '||' is false

        if (line.empty() || line[0] == '#' || line[0] == '%' || line[0] == '@') {
            ^

src/core/algorithms/cmspade/parser/cmspade_parser.cpp:36: Assuming the condition is false

        if (line.empty() || line[0] == '#' || line[0] == '%' || line[0] == '@') {
                                                                ^

src/core/algorithms/cmspade/parser/cmspade_parser.cpp:36: Taking false branch

        if (line.empty() || line[0] == '#' || line[0] == '%' || line[0] == '@') {
        ^

src/core/algorithms/cmspade/parser/cmspade_parser.cpp:40: Calling 'CMSpadeParser::ParseSequenceLine'

        ParseSequenceLine(line, sequence_id, sequence);
        ^

src/core/algorithms/cmspade/parser/cmspade_parser.cpp:55: Loop condition is true. Entering loop body

    while(iss >> token){
    ^

src/core/algorithms/cmspade/parser/cmspade_parser.cpp:56: Assuming the condition is false

        if (token[0] == '<'){
            ^

src/core/algorithms/cmspade/parser/cmspade_parser.cpp:56: Taking false branch

        if (token[0] == '<'){
        ^

src/core/algorithms/cmspade/parser/cmspade_parser.cpp:60: Taking false branch

        if (token == "-1"){
        ^

src/core/algorithms/cmspade/parser/cmspade_parser.cpp:67: Taking true branch

        else if (token == "-2"){
             ^

src/core/algorithms/cmspade/parser/cmspade_parser.cpp:68: Assuming the condition is true

            if (!current_itemset->empty()) {
                ^

src/core/algorithms/cmspade/parser/cmspade_parser.cpp:68: Taking true branch

            if (!current_itemset->empty()) {
            ^

src/core/algorithms/cmspade/parser/cmspade_parser.cpp:69: Smart pointer 'current_itemset' of type 'std::unique_ptr' is reset to null when moved from

                sequence->AddItemset(std::move(current_itemset));
                                     ^

src/core/algorithms/cmspade/parser/cmspade_parser.cpp:72: Execution continues on line 83

            break;
            ^

src/core/algorithms/cmspade/parser/cmspade_parser.cpp:82: Dereference of null smart pointer 'current_itemset' of type 'std::unique_ptr'

    if (!current_itemset->empty()) {
         ^

Comment thread src/tests/unit/test_cmspade.cpp Outdated

class CMSpadeInternalTest : public ::testing::Test {
protected:
TestableCMSpade algo;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: invalid case style for protected member 'algo' [readability-identifier-naming]

Suggested change
TestableCMSpade algo;
TestableCMSpade algo_;

src/tests/unit/test_cmspade.cpp:215:

-     algo.SetPath(file.GetPath());
-     algo.LoadDataInternal();
-     algo.BuildCMap();
+     algo_.SetPath(file.GetPath());
+     algo_.LoadDataInternal();
+     algo_.BuildCMap();

src/tests/unit/test_cmspade.cpp:219:

-     auto cmap_eq = algo.cmap_equal_;
+     auto cmap_eq = algo_.cmap_equal_;

src/tests/unit/test_cmspade.cpp:227:

-     algo.SetPath(file.GetPath());
-     algo.LoadDataInternal();
-     algo.BuildCMap();
+     algo_.SetPath(file.GetPath());
+     algo_.LoadDataInternal();
+     algo_.BuildCMap();

src/tests/unit/test_cmspade.cpp:231:

-     auto cmap_after = algo.cmap_after_;
+     auto cmap_after = algo_.cmap_after_;

src/tests/unit/test_cmspade.cpp:244:

-     auto candidates = algo.Generator(&p1, &p2, 1, false, false, false, false);
+     auto candidates = algo_.Generator(&p1, &p2, 1, false, false, false, false);

src/tests/unit/test_cmspade.cpp:254:

-     auto candidates = algo.Generator(&p1, &p2, 1, true, true, true, true);
+     auto candidates = algo_.Generator(&p1, &p2, 1, true, true, true, true);

src/tests/unit/test_cmspade.cpp:260:

-     algo.SetPath(file.GetPath());
-     algo.LoadDataInternal();
+     algo_.SetPath(file.GetPath());
+     algo_.LoadDataInternal();

src/tests/unit/test_cmspade.cpp:263:

-     algo.itemset_counts_ = std::make_shared<std::vector<ItemsetId>>(std::vector<ItemsetId>{1, 1});
-     algo.BuildFrequentItems(2);
-     algo.RemoveInfrequentItemsFromSequences();
+     algo_.itemset_counts_ = std::make_shared<std::vector<ItemsetId>>(std::vector<ItemsetId>{1, 1});
+     algo_.BuildFrequentItems(2);
+     algo_.RemoveInfrequentItemsFromSequences();

src/tests/unit/test_cmspade.cpp:267:

-     auto& seqs = algo.sequences_;
+     auto& seqs = algo_.sequences_;

@lubimoemestechko lubimoemestechko marked this pull request as draft March 18, 2026 19:38
@lubimoemestechko lubimoemestechko marked this pull request as ready for review April 20, 2026 18:13

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

}

TEST(CMSpadeExecuteTest, RemoveEmptySequences) {
TestableCMSpade algo_;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: invalid case style for variable 'algo_' [readability-identifier-naming]

Suggested change
TestableCMSpade algo_;
TestableCMSpade algo;

src/tests/unit/test_cmspade.cpp:288:

-     algo_.SetPath(file.GetPath());
-     algo_.LoadDataInternal();
-     algo_.itemset_counts_ =
+     algo.SetPath(file.GetPath());
+     algo.LoadDataInternal();
+     algo.itemset_counts_ =

src/tests/unit/test_cmspade.cpp:293:

-     algo_.minsup_absolute_ = 2;
-     algo_.BuildFrequentItems();
-     algo_.RemoveInfrequentItemsFromSequences();
-     algo_.RemoveEmptySequences();
+     algo.minsup_absolute_ = 2;
+     algo.BuildFrequentItems();
+     algo.RemoveInfrequentItemsFromSequences();
+     algo.RemoveEmptySequences();

src/tests/unit/test_cmspade.cpp:298:

-     EXPECT_EQ(algo_.sequences_.size(), 2);
+     EXPECT_EQ(algo.sequences_.size(), 2);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant