“GitHub Actions[1]” is a “CI/CD” workflow service officially launched by GitHub, which aims to reduce the operational burden of open source contributors and empower the open source community with cloud-native “DevOps”. If you don’t know what CI/CD and DevOps are, please refer to the article I wrote in the Night Team Official Account, “Easily Build Enterprise DevOps Workflows with Open Source Software”[2], ArtiPub[3], all of which integrate GitHub Actions. As a development contributor, I think GitHub Actions is not only easy to use, but also really free (which is the main thing). Hopefully, many developers who don’t understand how to use GitHub Actions in their open source projects can take inspiration from this article.
For those of you who are not familiar with GitHub Actions, I highly recommend reading the official GitHub Actions documentation[4], which includes video introductions[5], quick starts[6], examples[7], concepts, principles, and more. If you study the documentation thoroughly, combined with your usual experience in using CI/CD, you should be able to do DevOps on GitHub very easily. The relevant code used in this article can be found in the official documentation.
Let’s sort out what we want to achieve: use GitHub Actions to run crawlers in your repository to get daily GitHub Trending[8].
The entire implementation process is as follows:
Now let’s get to work!
Let’s put the crawler code into the GitHub repository first. Today’s topic is GitHub Actions, so the crawler implementation part does not do much parsing.
The core code is as follows:
First, let’s find the “Actions” tab on the GitHub repository homepage and click on it.
You’ll now see this welcome page above, indicating that your repository hasn’t joined any GitHub Actions workflows yet. You can also see that there are many official introductions and some popular template workflows above, you can directly click the Configure button of a workflow to create a new workflow.
Search for Python and find a workflow that runs a Python program. That’s it.
After clicking the Configure button, you will be taken to the following editing page.
The so-called GitHub workflow is actually a “YAML configuration file”, similar to the now popular PaaS, IaaS, cloud-native applications and the like, all using code to automate configuration.
As you can see, there are already some default workflow codes in it, and we only need to change them slightly.
Click Commit (Commit) on the interface, and you should see that the workflow configuration file of github-trending-crawler.yml is generated in the .github/workflows directory.
In simple terms, the workflow above does a few things:
We submitted this workflow, and it should “run automatically” because the trigger conditions are set by default in the workflow.
This means that whenever we push Commits or Pull Request, the workflow automatically triggers execution.
Now let’s look at the run, and if you also come to the Actions page, you’ll see a list of executions of the workflow appear.
Click on the latest one, click build, you will see the relevant logs, and then we expand the logs for the Run step.
As you can see, GitHub Trending’s Daily Top Repository, we’ve printed it out as the program expects. Check if it matches the one on the real page.
Great, the output is exactly the same as the actual page result. Succeed with flying colors!
How can running crawlers not have “scheduled tasks”? And GitHub Actions just happens to support it! So let’s just add it.
Open the workflow edit page you created earlier, adding a piece of code to the Trigger Condition Code area, which will read as follows.
where the cron expression 0 * * * * means that 0 minutes per hour is triggered. Readers who are not familiar with Cron expressions can refer to Cron Guru [9].
After editing the commit, we should be able to see this hourly scheduled crawler task.
For more information about the triggers of GitHub Actions, please refer to the official documentation[10].
This article describes how to use the GitHub Actions workflow to deploy a simple “crawler-timed task.” The following techniques are used:
The entire code sample repository is on GitHub: https://github.com/tikazyq/codao-code
If you are interested in the author’s article, you can add the author WeChat tikazyq1 and indicate “the way of the code”, the author will pull you into the “way of the code” exchange group.
The English version of this article was published simultaneously on dev.to[13], and technology sharing knows no borders, and the big guys are welcome to point it out.
GitHub Actions: https://docs.github.com/cn/actions
Easily Build Enterprise DevOps Workflows with Open Source Software: https://mp.weixin.qq.com/s?src=11×tamp=1665707721&ver=4103&signature=gLUg5OqfEcWYe3ocTknKlmcbTy04ysBk5SuXrFFNPbA79eJPI4OfN8hwLZAv1jRoOolcJMg13UcWdw6tQBUyZi8gqH0zsl8t8-73Bf96uPkI3YNa0pbCnnFppxLcIYs&new=1 *
ArtiPub: https://github.com/crawlab-team/artipub
Official Documentation: https://docs.github.com/cn/actions
Video Description: https://youtu.be/cP0I9w2coGU
Quick Start: https://docs.github.com/cn/actions/quickstart
Example: https://docs.github.com/cn/actions/examples
GitHub Trending: https://github.com/trending
Cron Guru: https://crontab.guru/
Related official documents: https://docs.github.com/cn/actions/using-workflows/events-that-trigger-workflows
requests: https://pypi.org/project/requests/
bs4: https://pypi.org/project/bs4/
dev.to: https://dev.to/tikazyq/cicd-in-action-how-to-use-microsofts-github-actions-in-a-right-way-4g89