Outreachy Internship – Wrapping Everything Up

My Outreachy internship has come to the end! I really appreciate this life-changing experience. Looking back now, when applying to Outreachy, I just want a breather space from my research thesis. I had been working on my thesis for quite a long time and got less and less motivated. I realized I should do something. Luckily, I got this Outreachy internship and it turned out to be so much more than I expected!

In the final post of the internship, I would like to share how the Outreachy internship helps me grow my skills, how my mentors help me along the way, what I achieved and what are the next steps, and what are amazing things during the internship.

Let’s get started! 🙂

Table of contents

What I have learned and improved

More than Machine Learning

The goal of my project is to productionize a machine learning model, Citation Need model, to make it accessible to tool developers other than machine learning researchers and practitioners. A part related to ML includes importing a trained Tensorflow model into the system, preparing data and getting prediction results. Based on this, developing an end-to-end pipeline that the jobs include writing Python scripts to interact with Wikipedia replicas and databases, parsing Wikitext to extract the information, and writing a Python multiprocess script to improve efficiency, etc.

Learn from Citation Hunt codebase

Acquainted with the codebase of Citation Hunt gave me a lot of ideas on how to build the system from 0 to 1. My project and Citation Hunt are similar in the fact that they both parse Wikipedia articles, generate some data and store the data into databases. The difference is their outputs and core algorithms. Therefore, in the course of understanding the codebase, I gradually had a clear picture of the workflow for my project. In addition, the well-organized, concise codebase inspired me to write code with high readability and maintainability.

Learn from Toolforge documentation and community

I also learned about Toolforge, a Wikimedia hosting environment, where tool developers create and run tools. Toolforge documentation is a great resource that I learned almost everything I needed for my project, including bastion host, the grid engine, etc. At the same time, I have improved my skills in Unix / Linux command line by using Toolforge frequently.

Although the documentation is useful most of the time, sometimes I might get stuck on problems in Toolforge. Asking the community for help is a great idea. I once filed an issue on Phabricator and tagged Toolforge, the problem was resolved soon. People in the community love to help others, so don’t worry, just make sure you explain the problem and provide enough information.

The best way to learn Git — Contribute to FOSS

Besides Python language and Toolforge environment, I get more familiar with Git, a version control system, during the internship. I would say the best way to learn Git is by contributing to the free and open source software (FOSS). From the middle period of the internship, I started to upload my code to the Github repository and created pull requests (PRs). I have learned to observe repository status, work with branches, make changes, synchronize, etc. and finally figured out how Git works.

Learn collaboration with Git

In addition, Git is a good way to learn collaboration. One mean of collaboration between my mentor and I was through PR comments on Github. That is to say, my mentor reviews my code and leaves comments in the PR and we would keep the conversation until we both agree the changes can be merged into the master branch.

The following are PRs I made during the internship:

73 comments in one PR!

I can’t thank my mentors enough. We had video calls once a week that was really helpful to keep on track of the goals, make constant progress and remind me if I deviate slightly from our aims. Every time I ask questions in our project group chat, I get responses quickly. Regarding communication skills, I was relatively passive in the past. During the internship, I gradually have changed my mindset and become more proactive to inform my progress and my ideas, which makes the project run smoother.

Project outcomes and future works

Here is the results I completed:

  • Developed a tool and public dataset named Citation Detective that contains sentences that have been identified as needing a citation using a machine learning-based classifier;
  • Created a proof of concept for integrating Citation Detective and Citation Hunt.

In the future, we hope to expand Citation Detective

  • to more languages Wikipedia projects;
  • to a greater number of articles;
  • to have more data fields, such as particular words that the model paid attention to;
  • to be more accessible, such as through the Quarry database querying service.

On the other hand, the prototype Citation Hunt

  • requires more UX works;
  • requires a mature Citation Detective a bit more (above future works mentioned)

in order to merge to real Citation Hunt.

A new journey begins

After the game is before the game.

Outreachy not only helps me feel more confident in making open source and free software contributions but also in working with a diverse team remotely. I’m more courageous in a team to ask questions, convey ideas. I came to know more what my interest is, what my advantages are, and how to prepare my career and find opportunities.

All this would not be possible without the help of my mentors. Guilherme Gonçalves, Miriam Redi, Sam Walton, thanks for all the time you spent selflessly guiding me, leading me, encouraging me and affirming me throughout the journey. All these things make me more confident to face the challenges ahead.

Getting to know Wikimedia and being part of the community is one of the amazing things during the internship. I still remember the vision of Wikimedia ー help everyone share in the sum of all knowledge ー inspired me at the very beginning. It is meaningful and fulfilling to me to have contributed a little to the Wikimedia movement. Although the internship ends, that is a new beginning for me to contribute to FOSS.

Cheers! 😀