Re: [Outreachy][Proposal]: Refactor in order to reduce Git’s global state
From: Bello Olamide <hidden>
Date: 2025-10-30 10:59:56
On Wed, 29 Oct 2025 at 16:51, Christian Couder [off-list ref] wrote:
Hi, On Wed, Oct 29, 2025 at 2:18 AM Bello Olamide [off-list ref] wrote:quoted
Hello, This is my proposal for the project "Refactor in order to reduce Git’s global state" for the 2025 Outreachy Internship program.Thanks for this proposal. [...]quoted
From a high level overview, environment.[ch] exposes some global variables that reflect a per-repository state and examples of such include git_work_tree_cfg, is_bare_repository_cfg, and core.* settings and functions which also depend on `the_repository` such as have_git_dir(), is_bare_repository(). After a brief study of some related work done on the project, it is important to understand the purpose of the identified global variable and how it is used across the code base, observing how it relates with other subsystems and moving it to the `struct repository` or `struct repo-settings` if its use is repository specific, or specify an appropriate context based on its scope and use this context in the accessor functions. For example in [1], Patrick Steinhardt observes that `core.hooksPath` is repository specific and is stored in the global variable `git_hooks_path`. The variable is then moved into local scope in the repo-settings struct and a new accessor function `repo_settings_get_hooks_path()` is written and used to set the `hooks_path` of the repo specific struct which the path subsystem reads from. Similarly in [2], `core.sharedRepository` is tracked via the global variables `the_shared_repository ` and `need_shared_repository`. These are then moved into the repo-settings struct, with new accessors functions written to modify them, and calls to the accessors in the path subsystem are then modified to replace the old accessors which modify the global variables.Nit: the above paragraph looks very big. Maybe it could be split a bit.
Okay I will do that, thank you
quoted
I also studied [3], [4] by Ayush Chandeker,] and [5] by John Cai to broaden my understanding of the project.Are there some cases where strategies other than writing new accessors functions were used?
Yes there were cases where the functions were adapted to use exactly what it needs down the call chain rather than writing new accessor functions. An example is https://public-inbox.org/git/20250306-b4-pks-objects-without-the-repository-v2-1-f3465327be69@pks.im/#Z31csum-file.h where the global variable `the_hash_algo` is replaced with an explicit parameter `const struct git_hash_algo *algo` in low-level functions such as `static struct hashfile *hashfd_internal()` and the call sites adapted to use r->hash_algo or the_repository->hash_algo in places where the subsystem has not gotten rid of `the-repository`. This is also a strategy that can be used to replace global variables.
Are there pieces of work on this that were started but not finished? Are you planning to finish them? What are the roadblocks that were faced when working on this?
Yes. There were pieces of work that were started but not finished which I plan to finish. As an example, the patch https://lore.kernel.org/git/20250309153321.254844-1-ayu.chandekar@gmail.com/ (local) attempts to move the `git_attributes_file` global variable to the `struct repository`. However because the global variable is used by the attributes subsystem and a single repository can have more than one set of attributes, that is the work-tree attributes and the index attributes, placing the variable into a repository instance and passing it around in the call chain will not be appropriate. Also most of the functions in the attributes subsystem pass the `index_state` as a parameter and not the repository. This is because an index knows its repository but a repository only knows its primary index. Therefore each repository for an index will need to be known from the index. As Junio pointed out in the discussion on the thread: "As the attribute system is all about giving extra information on the paths that appear in the index and in the working tree, it may make sense for the API to go from the index state which is about the index and the working tree to access the attributes, rather than from the repository structure, which controls a lot wider concept and moving anything and everything there will easily and quickly make it a messy kitchen sink." So Given that the `index_state` struct has a repo member, we can move 'git_attributes_file' into the repo struct but access it through the `index_state`. By doing that we know the index truly owns the attributes. There is also `is_bare_repository_cfg` as seen in https://lore.kernel.org/git/pull.1826.git.git.1730926082.gitgitgadget@gmail.com/ (local) I have only skimmed through the discussions and patches to understand why it was not finished. But I will do an in depth study to understand why it was not completed and what it takes to finish it.
quoted
3. Review Existing Patch and Define Criteria (December 16 - January 9, 2026): ------------------------------------------------------------- - Thoroughly examine the existing patch series submitted to the mailing list to understand; * What criteria makes a global variable a suitable candidate to be moved to the `struct repository` or `struct repo-settings` * What appropriate context it should be moved into based on its interactions with other subsystems. * If remaining a global variable is the best approach in its case. - This information can be gotten by paying attention to the discussions in the patches and also engaging with my mentors and the Git community.Are you sure that it will be possible to define clear criteria?
Yes it will be possible to define clear criteria per global variable.
For example, from my brief study of previous work, if the variable value is:
1. meant to be different for different repositories, it is a candidate
to move, if not then it is left
as is, like the case of `local_repo_env[]`.
2. used during early startup, it cannot be moved blindly but will need
a closer inspection
and refactoring of the startup code as is the case with
`have_git_dir()` noted by Patrick and
Shejialuo in
https://lore.kernel.org/git/20250305104650.238392-1-ayu.chandekar@gmail.com/ (local).
Its relationship with other subsystems is also a criteria to define
such as the case of
`git_attributes_file mentioned` above
Thanks