Skip to content
Florent Viard edited this page Feb 7, 2024 · 9 revisions

Welcome to the S3cmd Google Summer of Code 2024 projects page.


Contributor's guide

We are quite open and don't require a lot of formalities for you to apply for a GSoC project with us.
In the following, you will find more info to help you determine if we could be a good fit for each other.

What is required

  • A good knowledge of Python
  • A basic knowledge of Git and Github
  • An understanding of what is an API and how to interact with a server
  • Being comfortable with the usage of command line tools
  • Be curious

Having some experience dealing with various versions of Python running on multiple OS (Linux, Mac, Windows) would be a great plus.
A previous experience with S3cmd, S3, Object Storage, or cloud services is NOT REQUIRED to apply but would be appreciated.
It is usually fun and easy to understand when you are new to the subject.

Note:

Despite being a client for "Object Storage" services, you can expect to be able to develop and test S3cmd with no or very low cost:

  • s3cmd is entirely based on Python with a very limited number of basic dependencies and doesn't need compilation, so very little "computer resources" are needed.
  • Small Open Source and S3 compatible servers can easily run in local (Ex.: Minio).
  • Cloud object storage services usually offer very comfortable "free tiers".

Apply for a project

You can find a list of suggested project ideas in the following of this page ([link](# Idea List)) but we also encourage candidates to come up with their own project idea.

How to apply:

  1. Try to understand the project and eventually give a try to s3cmd
  2. Read the GSoC timeline, contributor responsibilities to ensure your eligibility
  3. (Recommended) Open a new issue here with the "[GSoC2024]" tag in title to present yourself
  • Who are you?
  • What is your background?
  • In which country are you located? Which Timezone?
  • What is your motivation to become a contributor for the S3cmd organization?
  • Which projects are you interested in and why?
  • What is your projected availability during the program to complete the project?
  1. Submit your application to the Google system before the deadline on April 2 (18:00 UTC). All applications must go through Google's application system; we can't accept any application unless it is submitted there.

Feel free to send an email to florent AT sodria.com if you want to exchange privately, to ask questions or discuss of a possible application.


Idea List

Project 1 - Create a new cache feature backed by an embedded database

  • Desirable skills: Python, DB, SQLite, S3 API
  • Estimated duration: 350 hours
  • Difficulty: hard
  • Mentor: @fviard

To be able to synchronize local and remote files, we have to compare the file "hash" from both sides.
This requires us to do an expensive "recalculation" of local files "hashes" at each run.
Performance can be improved a lot by using a cache of local files "hashes" to avoid this recalculation.

Currently, s3cmd has already a "cache" feature but its implementation is very inefficient.
It is a single raw text file based on a "pickle" marshaling of the file list in memory.
We could improve considerably the performance, the reliability and the memory usage by developing a brand new cache logic that would use an embedded database (Sqlite3, MDB, LMDB, ...) to store the information.

In addition, there is a limitation of the s3 protocol regarding big files (ie multipart files) that prevents us to be able to retrieve the "hash" of the remote side for such a file.
If the new cache system could record some info about the remote side, the performance could be boosted even more.

Project 2 - Add a server "profile" option to tweak the logic behavior to specifics of a given target server/service type

  • Desirable skills: Python, API, Cli, S3 API
  • Estimated duration: 350 hours
  • Difficulty: medium
  • Mentor: @fviard

Originally, s3cmd was developed to only interact with the Amazon AWS S3 service.
Little by little, a lot of other cloud services appeared that were offering an S3-Compatible interface.
At the same time, a lot of OSS and proprietary software stacks were also created to self host S3-Compatible servers.

Sadly, so far, s3cmd stayed a "one size" fit all client for S3 services, with the lowest common denominator in term of API usage for all servers and services.
For example, we expect all services to use "MD5" for "file hash" calculation.
Or we might not profit of more interesting API or API versions provided by some services as they are not widely available.

The purpose of this project is to offer a way for users to select a preset "profile" for the service that he is using.
Each profile will have a predetermined set a dynamic behavior configurations like "feature flags".

The main goal of this project is to create the profile option and the general logic.
Secondary goals are to create some dynamic behaviors using these profiles, and to create the profile for most common server/services types (ex.: "aws", "gcs", "digitalocean", "scaleway", "ibmcos", "minio", "radosgw", ...)

Project 3 - Add shell scripts for command line auto-completion

  • Desirable skills: Shell, Bash, ZSH, Python
  • Estimated duration: 175 hours
  • Difficulty: medium
  • Mentor: @fviard

It would be nice to have the proper shell scripts to have command line auto-completion for s3cmd.
It should auto-complete commands but also be able to retrieve "remote" path suggestions when possible. This project would probably require more "shell" skills than "Python" skills.

We would like to have an auto-completion script at least for Bash and ZSH,
but it would be ok if a student wants to do a smaller by only supporting a single shell type (Bash or ZSH).

Related: https://github.com/s3tools/s3cmd/issues/985 , https://github.com/s3tools/s3cmd/issues/1092

Project 4 - Add command and options to retrieve and manipulate versioned files

  • Desirable skills: Python, S3 API
  • Estimated duration: 175 hours
  • Difficulty: medium
  • Mentor: @fviard

S3 bucket can have a "versioning" option enabled to preserve previous versions of files after each modification.
But, currently, we don't provide any way to access or manipulate previous version of files.
The purpose of this project would be to add support for versioning for most of the commands where it would make sense.

The implementation should not be that difficult, but the hardest part of this project would be on reflecting how this could be done with a simple and obvious way for the user, without risking any regression.

Related: https://github.com/s3tools/s3cmd/issues/341

Project 5 - Rework the command line parser to be more user friendly

  • Desirable skills: Python, Shell
  • Estimated duration: 175 hours
  • Difficulty: medium
  • Mentor: @fviard

s3cmd supports a huge number of commands and flags from the command line.
After adding so many features, the help is really crowded and it might be hard for a new user to understand how to use a command, or which flag might be relevant. The purpose of this project would be to rework the parser, re-organise commands, and maybe group them, in order to be able to provide a proper "help" per command that will not be crowded with useless flags. Related: https://github.com/s3tools/s3cmd/issues/1035

Other ideas

To be completed ... more project ideas to be added

Additionally, you can have a look at the opened issues with "feature-request" labels to find alternative idea of projects: https://github.com/s3tools/s3cmd/issues?q=is%3Aopen+is%3Aissue+label%3Afeature-request