Get doi by scraping actual biorxiv page?

[Sometimes](https://github.com/greenelab/preprint-bot/actions) randomly Disqus returns a biorxiv link that doesn't have the DOI in it. For example in [this run](https://github.com/greenelab/preprint-bot/runs/2757323711?check_suite_focus=true), `https://www.biorxiv.org/content/early/2018/11/09/459529` is returned, but that redirects to the correct/expected link `https://www.biorxiv.org/content/10.1101/459529v1` that contains the complete doi.

To simplify the bot code, I made it read the DOI from the url, assuming and hoping it always would contain it. If we ever want this to be more robust, we could have the bot actually fetch the HTML contents at the link and find the DOI in the document: 

![image](https://user-images.githubusercontent.com/8326331/121279544-b9a83d80-c8a2-11eb-9390-11bb66352678.png)

In the upcoming PR, this at least wont crash the bot, it will just skip the comment with the non-doi link.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Get doi by scraping actual biorxiv page? #6

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Get doi by scraping actual biorxiv page? #6

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions