从Git历史记录中删除敏感文件及其提交

[英]Remove sensitive files and their commits from Git history


I would like to put a Git project on GitHub but it contains certain files with sensitive data (usernames and passwords, like /config/deploy.rb for capistrano).

我想在GitHub上放一个Git项目,但它包含一些带有敏感数据的文件(用户名和密码,比如capistrano的/config/deploy.rb)。

I know I can add these filenames to .gitignore, but this would not remove their history within Git.

我知道我可以将这些文件名添加到.gitignore,但这不会删除他们在Git中的历史记录。

I also don't want to start over again by deleting the /.git directory.

我也不想通过删除/.git目录重新开始。

Is there a way to remove all traces of a particular file in your Git history?

有没有办法删除Git历史记录中特定文件的所有痕迹?

10 个解决方案

#1


For all practical purposes, the first thing you should be worried about is CHANGING YOUR PASSWORDS! It's not clear from your question whether your git repository is entirely local or whether you have a remote repository elsewhere yet; if it is remote and not secured from others you have a problem. If anyone has cloned that repository before you fix this, they'll have a copy of your passwords on their local machine, and there's no way you can force them to update to your "fixed" version with it gone from history. The only safe thing you can do is change your password to something else everywhere you've used it.

出于所有实际目的,您首先要担心的是改变您的密码!从您的问题中不清楚您的git存储库是完全本地的还是其他地方还有远程存储库;如果它是远程的并且没有从其他人那里得到保护,则说明您有问如果有人在您修复此问题之前克隆了该存储库,那么他们将在其本地计算机上拥有您的密码副本,并且您无法强制他们更新到您的“固定”版本,并且它已从历史记录中删除。您可以做的唯一安全的事情是将密码更改为您使用它的任何地方。


With that out of the way, here's how to fix it. GitHub answered exactly that question as an FAQ:

有了这个,这是如何解决它。 GitHub正好回答了这个问题:

Note for Windows users: use double quotes (") instead of singles in this command

Windows用户注意事项:在此命令中使用双引号(“)而不是单引号

git filter-branch --index-filter \
'git update-index --remove filename' <introduction-revision-sha1>..HEAD
git push --force --verbose --dry-run
git push --force

Keep in mind that once you've pushed this code to a remote repository like GitHub and others have cloned that remote repository, you're now in a situation where you're rewriting history. When others try pull down your latest changes after this, they'll get a message indicating that the the changes can't be applied because it's not a fast-forward.

请记住,一旦您将此代码推送到像GitHub这样的远程存储库并且其他人已经克隆了该远程存储库,您现在处于重写历史记录的状态。当其他人尝试在此之后下拉您的最新更改时,他们会收到一条消息,指示无法应用更改,因为它不是快进。

To fix this, they'll have to either delete their existing repository and re-clone it, or follow the instructions under "RECOVERING FROM UPSTREAM REBASE" in the git-rebase manpage.

要解决这个问题,他们必须删除现有的存储库并重新克隆它,或者按照git-rebase联机帮助页中的“从上游重新恢复”中的说明进行操作。


In the future, if you accidentally commit some changes with sensitive information but you notice before pushing to a remote repository, there are some easier fixes. If you last commit is the one to add the sensitive information, you can simply remove the sensitive information, then run:

在将来,如果您不小心使用敏感信息提交了一些更改,但在推送到远程存储库之前注意到了这一点,则可以使用一些更简单的修复程序。如果您上次提交是添加敏感信息的那个,您只需删除敏感信息,然后运行:

git commit -a --amend

That will amend the previous commit with any new changes you've made, including entire file removals done with a git rm. If the changes are further back in history but still not pushed to a remote repository, you can do an interactive rebase:

这将使用您所做的任何新更改修改先前的提交,包括使用git rm完成的整个文件删除。如果更改在历史记录中进一步返回但仍未推送到远程存储库,则可以执行交互式rebase:

git rebase -i origin/master

That opens an editor with the commits you've made since your last common ancestor with the remote repository. Change "pick" to "edit" on any lines representing a commit with sensitive information, and save and quit. Git will walk through the changes, and leave you at a spot where you can:

这将打开一个编辑器,其中包含自上次使用远程存储库的共同祖先以来所做的提交。在代表具有敏感信息的提交的任何行上将“选择”更改为“编辑”,然后保存并退出。 Git将会介绍这些变化,并将您留在可以:

$EDITOR file-to-fix
git commit -a --amend
git rebase --continue

For each change with sensitive information. Eventually, you'll end up back on your branch, and you can safely push the new changes.

对于每次更改敏感信息。最终,您将最终返回到您的分支,并且您可以安全地推送新的更改。

#2


Changing your passwords is a good idea, but for the process of removing password's from your repo's history, I recommend the BFG Repo-Cleaner, a faster, simpler alternative to git-filter-branch explicitly designed for removing private data from Git repos.

更改密码是一个好主意,但是对于从repo历史中删除密码的过程,我推荐使用BFG Repo-Cleaner,这是一种更快,更简单的git-filter-branch替代方案,专门用于从Git repos中删除私有数据。

Create a private.txt file listing the passwords, etc, that you want to remove (one entry per line) and then run this command:

创建一个private.txt文件,列出要删除的密码等(每行一个条目),然后运行以下命令:

$ java -jar bfg.jar  --replace-text private.txt  my-repo.git

All files under a threshold size (1MB by default) in your repo's history will be scanned, and any matching string (that isn't in your latest commit) will be replaced with the string "***REMOVED***". You can then use git gc to clean away the dead data:

将扫描您的仓库历史记录中阈值大小(默认为1MB)的所有文件,并且任何匹配的字符串(不在您的最新提交中)将替换为字符串“*** REMOVED ***”。然后,您可以使用git gc清除死数据:

$ git gc --prune=now --aggressive

The BFG is typically 10-50x faster than running git-filter-branch and the options are simplified and tailored around these two common use-cases:

BFG通常比运行git-filter-branch快10-50倍,并且可以简化这些选项并围绕这两个常见用例进行定制:

  • Removing Crazy Big Files
  • 删除疯狂的大文件

  • Removing Passwords, Credentials & other Private data
  • 删除密码,凭据和其他私人数据

Full disclosure: I'm the author of the BFG Repo-Cleaner.

完全披露:我是BFG Repo-Cleaner的作者。

#3


I recommend this script by David Underhill, worked like a charm for me.

我推荐大卫安德希尔的这个剧本,对我来说就像一个魅力。

It adds these commands in addition natacado's filter-branch to clean up the mess it leaves behind:

除了natacado的filter-branch之外,它还添加了这些命令来清理它留下的混乱:

rm -rf .git/refs/original/
git reflog expire --all
git gc --aggressive --prune

Full script (all credit to David Underhill)

完整的脚本(大卫安德希尔的所有功劳)

#!/bin/bash
set -o errexit

# Author: David Underhill
# Script to permanently delete files/folders from your git repository.  To use 
# it, cd to your repository's root and then run the script with a list of paths
# you want to delete, e.g., git-delete-history path1 path2

if [ $# -eq 0 ]; then
    exit 0
fi

# make sure we're at the root of git repo
if [ ! -d .git ]; then
    echo "Error: must run this script from the root of a git repository"
    exit 1
fi

# remove all paths passed as arguments from the history of the repo
files=$@
git filter-branch --index-filter \
"git rm -rf --cached --ignore-unmatch $files" HEAD

# remove the temporary history git-filter-branch
# otherwise leaves behind for a long time
rm -rf .git/refs/original/ && \
git reflog expire --all && \
git gc --aggressive --prune

The last two commands may work better if changed to the following:

如果更改为以下内容,最后两个命令可能会更好:

git reflog expire --expire=now --all && \
git gc --aggressive --prune=now

#4


If you have already pushed to GitHub, the data is compromised even if you force push it away one second later because:

如果您已经推送到GitHub,即使您在一秒钟之后强制推送它,数据也会受到影响,原因是:

  • GitHub keeps dangling commits for a long time.

    GitHub长时间保持悬空状态。

    GitHub staff does have the power to delete such dangling commits if you contact them however, which is what you should do: How to remove a dangling commit from GitHub?

    如果你联系他们,GitHub工作人员确实有权删除这些悬空提交,这就是你应该做的:如何从GitHub删除悬空提交?

    Dangling commits can be seen either through:

    悬挂提交可以通过以下方式看到:

    One convenient way to get the source at that commit then is to use the download zip method, which can accept any reference, e.g.: https://github.com/cirosantilli/myrepo/archive/SHA.zip

    在该提交中获取源的一种便捷方法是使用下载zip方法,该方法可以接受任何引用,例如:https://github.com/cirosantilli/myrepo/archive/SHA.zip

  • It is possible to fetch the missing SHAs either by:

    可以通过以下方式获取缺失的SHA:

    • listing API events with type": "PushEvent". E.g. mine: https://api.github.com/users/cirosantilli/events/public (Wayback machine)
    • 列出类型为“:”的API事件PushEvent“。例如,我的:https://api.github.com/users/cirosantilli/events/public(Wayback machine)

    • more conveniently sometimes, by looking at the SHAs of pull requests that attempted to remove the content
    • 通过查看试图删除内容的拉取请求的SHA,有时更方便

  • There are scrappers like http://ghtorrent.org/ and https://www.githubarchive.org/ that regularly pool GitHub data and store it elsewhere.

    像http://ghtorrent.org/和https://www.githubarchive.org/这样的报道器经常汇集GitHub数据并将其存储在其他地方。

    I could not find if they scrape the actual commit diff, but it is technically possible.

    我无法找到它们是否刮掉了实际的提交差异,但它在技术上是可行的。

To test this out, I have created a repo: https://github.com/cirosantilli/test-dangling and done:

为了测试这个,我创建了一个repo:https://github.com/cirosantilli/test-dangling并完成:

git init
git remote add origin git@github.com:cirosantilli/test-dangling.git

touch a
git add .
git commit -m 0
git push

touch b
git add .
git commit -m 1
git push

touch c
git rm b
git add .
git commit --amend --no-edit
git push -f

If you delete the repository however, commits do disappear even from the API immediately and give 404, e.g. https://api.github.com/repos/cirosantilli/test-dangling-delete/commits/8c08448b5fbf0f891696819f3b2b2d653f7a3824 This works even if you recreate another repository with the same name.

但是,如果删除存储库,则提交甚至会立即从API中消失并提供4​​04,例如, https://api.github.com/repos/cirosantilli/test-dangling-delete/commits/8c08448b5fbf0f891696819f3b2b2d653f7a3824即使您重新创建另一个具有相同名称的存储库,这也可以正常工作。

So my recommended course of action is:

所以我建议的行动方针是:

  • change your credentials

    更改您的凭据

  • if that is not enough (e.g. naked pics):

    如果这还不够(例如裸照片):

    • delete the repository
    • 删除存储库

    • contact support

#5


To be clear: The accepted answer is correct. Try it first. However, it may be unnecessarily complex for some use cases, particularly if you encounter obnoxious errors such as 'fatal: bad revision --prune-empty', or really don't care about the history of your repo.

要明确:接受的答案是正确的。先试试吧。但是,对于某些用例,它可能会不必要地复杂,特别是如果您遇到令人讨厌的错误,例如“致命错误:错误修订 - 出现空”,或者真的不关心您的回购历史。

An alternative would be:

另一种选择是:

  1. cd to project's base branch
  2. cd到项目的基础分支

  3. Remove the sensitive code / file
  4. 删除敏感代码/文件

  5. rm -rf .git/ # Remove all git info from your code
  6. rm -rf .git /#从代码中删除所有git信息

  7. Go to github and delete your repository
  8. 转到github并删除您的存储库

  9. Follow this guide to push your code to a new repository as you normally would - https://help.github.com/articles/adding-an-existing-project-to-github-using-the-command-line/
  10. 按照本指南将代码推送到新的存储库,就像往常一样 - https://help.github.com/articles/adding-an-existing-project-to-github-using-the-command-line/

This will of course remove all commit history branches, and issues from both your github repo, and your local git repo. If this is unacceptable you will have to use an alternate approach.

这当然会删除所有提交历史分支,以及来自github仓库和本地git仓库的问题。如果这是不可接受的,您将不得不使用替代方法。

Call this the nuclear option.

称之为核选项。

#6


Here is my solution in windows

这是我在windows中的解决方案

git filter-branch --tree-filter "rm -f 'filedir/filename'" HEAD

git filter-branch --tree-filter“rm -f'storeir / filename'”HEAD

git push --force

git push --force

make sure that the path is correct otherwise it won't work

确保路径正确,否则无法正常工作

I hope it helps

我希望它有所帮助

#7


Use filter-branch:

git filter-branch --force --index-filter 'git rm --cached --ignore-unmatch *file_path_relative_to_git_repo*' --prune-empty --tag-name-filter cat -- --all

git push origin *branch_name* -f

#8


You can use git forget-blob.

你可以使用git forget-blob。

The usage is pretty simple git forget-blob file-to-forget. You can get more info here

使用非常简单git forget-blob文件遗忘。你可以在这里获得更多信息

https://ownyourbits.com/2017/01/18/completely-remove-a-file-from-a-git-repository-with-git-forget-blob/

It will disappear from all the commits in your history, reflog, tags and so on

它将从您的历史记录,reflog,标签等中的所有提交中消失

I run into the same problem every now and then, and everytime I have to come back to this post and others, that's why I automated the process.

我时不时地遇到同样的问题,每次我必须回到这个帖子和其他人,这就是我自动化流程的原因。

Credits to contributors from Stack Overflow that allowed me to put this together

来自Stack Overflow的贡献者的积分让我可以把它放在一起

#9


I've had to do this a few times to-date. Note that this only works on 1 file at a time.

到目前为止,我不得不这样做几次。请注意,这一次仅适用于1个文件。

  1. Get a list of all commits that modified a file. The one at the bottom will the the first commit:

    获取修改文件的所有提交的列表。底部的那个将是第一次提交:

    git log --pretty=oneline --branches -- pathToFile

    git log --pretty = oneline --branches - pathToFile

  2. To remove the file from history use the first commit sha1 and the path to file from the previous command, and fill them into this command:

    要从历史记录中删除文件,请使用第一个提交sha1和上一个命令中的文件路径,并将它们填入此命令:

    git filter-branch --index-filter 'git rm --cached --ignore-unmatch <path-to-file>' -- <sha1-where-the-file-was-first-added>..

    git filter-branch --index-filter'git rm --cached --ignore-unmatch ' - ..

#10


So, It looks something like this:

所以,它看起来像这样:

git rm --cached /config/deploy.rb
echo /config/deploy.rb >> .gitignore

Remove cache for tracked file from git and add that file to .gitignore list

从git中删除跟踪文件的缓存,并将该文件添加到.gitignore列表中


注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:http://www.silva-art.net/blog/2009/05/16/46b7b7009b63ae79d13669aa7039878b.html



 
© 2014-2019 ITdaan.com 粤ICP备14056181号