Skip to content
This repository was archived by the owner on Feb 15, 2023. It is now read-only.
This repository was archived by the owner on Feb 15, 2023. It is now read-only.

Get doi by scraping actual biorxiv page? #6

Description

@vincerubinetti

Sometimes randomly Disqus returns a biorxiv link that doesn't have the DOI in it. For example in this run, https://www.biorxiv.org/content/early/2018/11/09/459529 is returned, but that redirects to the correct/expected link https://www.biorxiv.org/content/10.1101/459529v1 that contains the complete doi.

To simplify the bot code, I made it read the DOI from the url, assuming and hoping it always would contain it. If we ever want this to be more robust, we could have the bot actually fetch the HTML contents at the link and find the DOI in the document:

image

In the upcoming PR, this at least wont crash the bot, it will just skip the comment with the non-doi link.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions