git-rest-cache

module
v0.0.0-...-0d5c79f Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 23, 2025 License: MIT

README

Git REST Cache

Git REST Cache is an open-source Go project designed to act as a caching proxy for online Git repositories. It provides fast, read-only access to repository file content without relying on remote API rate limits. The project supports on‐demand shallow cloning, periodic background updates, token validation, and TTL-based pruning of stale caches, ensuring repositories are always up-to-date while minimizing API calls and network overhead.

This makes it an ideal solution for developer tools, AI-powered code analysis, CI/CD systems, and automation scripts that require efficient, low-latency access to source code files. By caching repositories locally, Git REST Cache avoids provider rate limits and accelerates file retrieval, making it particularly useful for large-scale projects and high-frequency operations.

Table of Contents

Features

  • On-Demand Cloning: Clones a repository using --depth=1 when a file is first requested.
  • Branch-Level Caching: Each repository is cached by a unique hash (computed from provider, owner, repo, and token) with each branch stored in its own subfolder.
  • Token-Based Access: Supports PAT/OAuth token validation to access private repositories.
  • Background Updates: Periodically fetches updates for cached repositories.
  • TTL & Pruning: Automatically removes caches that have not been accessed for a configurable time.
  • Extensible Provider Support: Easily add support for GitHub, GitLab, Bitbucket, Azure DevOps, etc.
  • Pluggable Architecture: Uses a GitCacheManager interface to abstract Git operations (cloning, fetching, reading files, deletion) for easier testing and extension.
  • Concurrency: Employs per-repo locking to ensure thread-safe operations without blocking unrelated repositories.

Architecture

The core idea is to maintain a local cache of Git repositories. Each repository is identified by a unique hash (derived from its provider, owner, repository name, and optionally the token used for cloning). Within each repository cache folder, branches are stored as separate subfolders. For example, the folder structure might look like:

/cached-repos/
  └── <repo-hash>/  
       ├── main/         # Cache for branch "main"
       ├── dev/          # Cache for branch "dev"
       └── feature-x/    # Cache for branch "feature-x"

A central GitCacheManager interface handles Git operations such as:

  • CloneRepo: Clones the repository for a specific branch.
  • FetchUpdates: Fetches new commits from the remote for a branch.
  • ReadFile: Returns file content from a cached branch.
  • DeleteRepo: Removes a stale repository cache.

This separation makes it easy to test the folder-scanning logic and Git operations without directly coupling to internal cache structures.

Authentication & Security

For private repositories, authentication tokens are validated against the provider’s API before access is granted. If valid, the token is cached to minimize repeated API calls and improve response time.

Token Caching & Expiry
  • Tokens are temporarily stored to avoid excessive API requests.
  • TTL (Time-to-Live) is configurable, ensuring tokens are refreshed at regular intervals.
  • When a token expires, it is revalidated automatically upon the next request.

This system ensures secure and efficient authentication, reducing latency while maintaining repository access control.

Installation

  1. Clone the Repository:

    git clone https://github.com/costinul/git-rest-cache.git
    cd git-rest-cache
    
  2. Build the Project:

    go build -o git-rest-cache ./cmd
    
  3. Run the Binary:

    ./git-rest-cache
    

Configuration

The project uses Viper and Cobra to support configuration via a YAML file, environment variables, and CLI flags.

Example config/config.yaml:

port: 8080
log-level: "info"
storage-folder: "./cached-repos"
repo-ttl: "24h"
token-ttl: "24h"
repo-check-interval: "5m"

Environment variables are prefixed with GIT_REST_CACHE_ (e.g., GIT_REST_CACHE_PORT=9090).

Usage

Git REST Cache exposes a REST API to retrieve file content from cached Git repositories. When a request is made, the service:

  1. Validates the token (if provided) via the appropriate Git provider.
  2. Clones the repository for the specified branch into a cache folder (if not already cached).
  3. Returns the requested file content from the cached branch.
Example Request

To fetch the README.md file from the GitHub repository https://github.com/costinul/git-rest-cache on the main branch, call:

http://localhost:8080/github/costinul/git-rest-cache/main/blob/README.md
  • If the repository is public, the request can be made without an X-Token header.
  • For private repositories, include the token in the X-Token header.

API Endpoints

Each Git provider has its own specific URL pattern for accessing repositories. Below is the current and planned support for various providers.

Implemented
GitHub
  • Blob (File Content):
    • URL Pattern:
      /github/:owner/:repo/:branch/blob/*filepath
    • Example Request:
      GET http://localhost:8080/github/costinul/git-rest-cache/main/blob/README.md
    • Description:
      Retrieves the content of a file (blob) from the specified branch. The API returns the requested file content with Content-Type: application/octet-stream.
  • List (Directory Listing):
    • URL Pattern:
      /github/:owner/:repo/:branch/list/*path
    • Example Request:
      GET http://localhost:8080/github/costinul/git-rest-cache/main/list/gitcache/
    • Description:
      Retrieves a directory listing for the specified path within the repository.
Planned Support

Additional providers will be supported in future releases, following similar patterns:

  • GitLab

    • Expected URL Pattern:
      /gitlab/:namespace/:repo/:branch/blob/*filepath
    • Note:
      GitLab supports nested groups, so :namespace may contain multiple levels (e.g., gitlab.com/acme/widgets/frontend-repo).
  • Bitbucket

    • Expected URL Pattern:
      /bitbucket/:workspace/:repo/:branch/blob/*filepath
    • Note:
      Newer Bitbucket URLs use workspaces instead of a simple owner/repo structure.
  • Azure DevOps

    • Expected URL Pattern:
      /devops/:organization/:project/:repo/:branch/blob/*filepath
    • Note:
      Azure DevOps URLs include both an organization and a project before the _git/:repo segment.
Request Headers
  • X-Token (optional):
    A valid authentication token is required for accessing private repositories. This token is validated against the provider’s API and, if valid, is cached to minimize repeated external validations.
Notes
  • If the requested repository or branch is not yet cached, it is automatically cloned on demand.
  • Subsequent requests will fetch the file content directly from the cache unless an update has occurred.

Testing

Run the following commands to execute tests:

  • Git Cache Module Tests (with race detection):

    go test -race -timeout 10m -count=5 ./gitcache -v
    
  • API Module Tests:

    go test -timeout 10s ./api -v
    

These tests cover unit tests and integration tests for caching logic, folder scanning, Git operations, and API endpoints.

License

This project is licensed under the MIT License.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL