The Real Problem with Large Binary Assets in Git

Large texture file upload progress on developer workstation

Ask any technical director at a game studio what their biggest version control headache is, and you will almost always get the same answer: binary assets. Specifically, big ones. A 4K diffuse texture at 50MB. A high-poly FBX at 200MB. An audio bank at 400MB. And a repo that has accumulated years of these files until it weighs in at 80GB and takes 45 minutes to clone from scratch.

Git's object model was not designed for this. When Git tracks a file, it stores the full content of every version in its object database. For text files, it compresses well and the delta between versions is small. For binary files, there is no delta — each new version is a full copy. A texture that gets re-exported three times is stored three times, in full, forever.

What Git LFS Actually Does

Git Large File Storage (LFS) is the standard answer. Instead of storing the binary content in Git's object database, LFS replaces it with a text pointer file. The pointer is tiny. The actual binary content lives on a separate LFS server. When you check out a branch, Git fetches the binaries you need from the LFS server on demand.

This works. The repo itself stays manageable. Clones are faster because you are not pulling every binary that ever existed — just the ones on your current branch and working tree. It is a real improvement over naive Git-with-binaries.

But it introduces problems that most guides do not mention up front.

Where LFS Breaks Down

Partial clone failures

LFS pointers and actual binary content can get out of sync. This happens when LFS pushes fail silently, when someone pushes without the LFS client installed, or when the LFS server experiences an interruption during a large upload. The result: other team members clone or pull, get a pointer file, try to open a texture in Maya, and see nothing. Diagnosing this requires understanding both Git internals and LFS server state — not something you want your artists spending time on.

Bandwidth with no smart deltas

LFS transfers the full binary content on every check-out. There is no delta sync. If you update a 100MB texture, every team member who pulls that branch downloads 100MB. At a studio with 30 people all pulling a weekly build, this adds up fast. It also means your CI pipeline is downloading gigabytes of assets on every run, unless you implement aggressive caching — which adds its own complexity.

History rewriting is destructive

Accidentally committed a 500MB file without LFS? The standard fix is git filter-branch or BFG Repo Cleaner. Both rewrite history, which invalidates every clone on the team. Everyone has to re-clone. In a studio mid-sprint, this is a half-day event. It happens more often than it should.

LFS locking is bolted on, not native

LFS added a file locking feature. It works, but it is not enforced by default, requires server-side configuration, and is not integrated into any engine's native UI. Artists who do not know the command line are not going to type git lfs lock assets/characters/hero.fbx before they start working. So locks either do not get used, or a workflow enforcement script breaks and nobody notices until there is a conflict.

The Scale Problem

All of these issues get worse as the project grows. A 5GB repo with LFS is manageable. A 200GB repo with LFS and 50 active contributors is a different situation. At that scale:

  • Initial clone times stretch to hours even with LFS
  • The LFS server becomes a performance bottleneck that needs its own capacity planning
  • Tracking which binaries are actually needed for a given branch becomes its own problem
  • Running git gc or other maintenance operations blocks everyone and can take hours

Studios at this scale either hire someone to maintain the Git/LFS infrastructure full-time, or they migrate to Perforce and accept that tradeoff.

What Proper Binary Handling Looks Like

The requirements are not complicated. You need:

  • Delta syncs for binaries — only transfer the parts of a file that changed, not the whole thing
  • Content-addressed storage that deduplicates identical assets across branches
  • Lazy fetching — only download the assets you actually need for your current task
  • First-class locking that works inside the engine editor without requiring CLI knowledge
  • Metadata indexing so you can answer "what branch has the latest version of hero.fbx" without cloning everything

LFS gives you the third point, partially. The rest require a storage architecture designed from scratch for game assets, not adapted from a code versioning tool.

This is one of the core design decisions behind Diversion. The binary asset pipeline is not a plugin to standard Git — it is a separate storage layer with its own delta encoding, deduplication, and metadata index. When you pull a branch, you get only what you need. When you push a texture update, only the changed blocks transfer. When an artist locks a file, the lock is visible in Unreal or Unity immediately.

Git with LFS will get you through a small-to-medium project. At studio scale, with 200GB repos and 40 concurrent contributors, the workarounds stop working. That is the point where you need tooling designed for the actual problem, not adapted around it.