Google Summer of Code – Community Bonding

One of the requirements of my organisation & GSoC is to keep weekly blogs. This will help your mentors to understand your progress and make a decision during the different evaluation stages about your progress. Community Bonding Phase is very critical. It is very important to make network with your mentors as well as set up your project plan for the entire coding duration. You can also use this time to set up your infrastructure and start making contributions to the project.

Please find my week by week blogs to find you what I did in GSoC’22.

Week 1 (May 21 – May 28)

I am thrilled to be a part of the awesome Red Hen Lab community! Thank you for selecting me and giving me a chance to contribute to the Red Hen codebase.

This post describes my journey after being selected as a Google Summer of Code (GSoC) student associated with Red Hen Lab. I plan to summarize my progress at the end of every week until the end of the summer.

Here’s the abstract of my project:

“The objective is to develop a machine learning model to tag sound effects in streams (like police sirens in a news-stream) of Red Hen’s data. A single stream of data can contain multiple sound effects, so the model should be able to label them from a group of known sound effects like a Multi-label classification problem. The first step would be to develop a baseline model using existing pre-trained deep learning models and add to the Red Hen’s pipeline. Then the performance can be improved using transfer learning and fine tuning the existing model to achieve better accuracy. In this process, the models can be trained on sound effects from noisy or human labeled data sets after they are pre-processed to avoid acoustic domain mismatch problems.”

My mentors are Austin Bennett, Ahmed Ismail Zahran & Mark Turner.

Week 2 (May 29 – June 4)

This is the first week of the GSoc. Redhen is collecting the details about mentors and contributers and they are being recorded here. They are also creating accounts for us in the Case Western Reserve University to access the HPC (high performance clusters). Most of the contributers from this year will do the work in these clusters so that they can be demonstrated in the RedHen platform. After that they are planning to do mass zoom meeting to start the program for this year.

GSoc team at Google, has also planned a virtual summit which aims to inspire and inform contributors. There will be talks from Googlers, GSoC mentors and former contributors who will share their personal and professional GSoC and open source journeys.

This week I also did the following:

Study about Yamnet
Understand other Audio Classification Use cases
Blog about Yamnet classification with all understanding

Week 3 (June 5 – June 12)

This week I am setting up the singularity container by following this link. A great reference about singularity is also found from Yunfei’s blog who was also a RedHen contributer in GSoc 2021.

Ambitions

Discuss the use cases and goals
Discuss scope of project with mentors
Set up a communication channel (async)
Discuss timelines and expectations

Ideas

Use slack (or something ?) for communication with students/past students/mentors ?

Challenges

Faced small issues in setting up the docker with Github actions. Fixed it with setting the submodules as recursive
Singularity container does not direct me to the gallina home, rather diverts me to /tmp. Need to check this issue.

Achievements

Successful in running the docker through github actions.
Also able to access singularity file in the case hpc.
Also updated the Techne Public Site with the fixes which I did for the Github workflow.
In this week we had a Meet and Greet session with all the RedHens where we got introduced to all contributors and mentors.
Have set up a slack channel where all the RedHens can discuss about issues like setting up infrastructure etc.
Prof Urgig discussed about differents tips to handle Gallina (discussed in the Tips sections)
Prof Turner discussed that the end product of project should be a working pipeline delivered in a singularity container.

Tips

Working with Singularity

Singularity containers should not be kept inside the ‘safe’ directory, rather it should be inside the gallina home but not inside the safe folder. This is because we dont want to back up the singularity container.
All the repository code should be committed in Github and what ever is in Github does not need to be in the “safe” directory.
The mount points are not available in the nodes. So you should always get a node with GPU using the “srun” command (as mentioned in the Techne Public Site). This command will give me a v100 GPU for 1 day with 5Gb of RAM. srun -p gpu -C gpu2v100 --gres=gpu:1 -n 1 -c 2 --time=1:00:00 --mem=5gb --pty /bin/bash
When a GPU is allocated, you can see the GPU node details using ‘nvidia-smi’ command
Also, it will navigate to /tmp and there we will not have access to gallina-home. So, at the start of the script, the data needs to be copied to the scratch folder (using scp). The scratch is a temporary folder which gets deleted automatically later on. You can refer to the Techne Public Site reference. scp hpc4:/mnt/rds/redhen/gallina/home/sxg1263/somefile.txt /scratch/sxg1263
After your code is run, copy the results in your home directory. You can also clean up the copied file from the “scratch” at the end of the script.
Use a standard docker container for the singularity and then install the packages in the docker. We should not install packages in the hpc or request installation for new packages to CSU.

Setback

Could not start with the project yet as me and mentors could not find a suitable time to meet