This is a piece of code which frequently attracts pull requests which
are summarily rejected. As there is no "git blame" for rejected pull
requests, try to avoid misguided work by adding a comment at the
relevant place.
When an import is restarted the first new note commit must use
refs/notes/hg^0 as the parent. As refs/notes/hg is only updated at the
end of a session we cannot have it present in all note commits. Neither
can we generate new marks for note commits as that would require a new
mapping scheme from hg versions numbers to git marks. A new mapping
scheme would break existing incremental import setups.
We therefore restructure the code to do the notes at the end of an
import session, thus only requiring a refs/notes/hg^0 reference in the
first commit.
Branch and tag names can now be renamed using a mechanism similar to the
-A option for author names.
-B specifies a mapping file for branch names, and -T a mapping file for
tags.
Apparently a bug (http://bz.selenic.com/show_bug.cgi?id=3511) in
multiple released versions of Mercurial could produce commits where
files had absolute paths.
As a "healthy" repo should not contain any absolute paths, it should be
safe to always strip a leading '/' from the path and let the conversion
continue.
When a mercurial repository does not use utf-8 for encoding author
strings and commit messages the "-e <encoding>" command line option
can be used to force fast-export to convert incoming meta data from
<encoding> to utf-8.
When "-e <encoding>" is given, we use Python's string
decoding/encoding API to convert meta data on the fly when processing
commits.
If the --hg-hash argument is given, the converted commits are
annotated with the original hg hash as a git note in the "hg"
namespace.
The notes can be shown by git log using the "--notes=hg" argument.
In a merge commit, the first parent is always the same parent that
would be recorded if the commit were not a merge and the other
parent(s) record the commit(s) being merged in.
Preserving this order is important so that log --first-parent works
properly and also so that the merge history is not distorted by an
incorrect permutation of the DAG.
Remove the code that sorts the merge parents based on node id so
that the correct DAG order is preserved.
The authors file format accepted by git-svnimport and git-cvsimport
actually allows blank lines and comment lines that start with '#'.
Ignore blank lines and lines starting with '#' as the first
non-whitespace character to be compatible with the authors file
format accepted by the referenced tools.
Add support for a new --hgtags option. When given, any .hgtags
files that may be present are exported.
Normally this is not desirable. However, when attempting to mimic
the actions of other hg exporters that always export any .hgtags
files this option can help produce matching export data.
If the file mode changes (for example from 10644 to 10755), but the
actual text of the file itself does not, then the change could be
missed since the hashes would remain the same.
If the hashes match, also compare the gitmode values before deciding
the file is unchanged.
Since hg runs and supports older versions of python, hg-fast-export.py
should too. Replace dictionary comprehension with equivalent code that
supports versions of python older than 2.7.
Because on Windows sys.stdout is initially in text mode, any LF
characters written to it will be transformed to CRLF, which causes git
to blow up. This change uses Windows platform-specific code to change
sys.stdout to binary mode.
After an update to Mercurial 2.3 the module 'repo' was removed and the
program crashed when trying to convert a repository. I checked the
imports with 'pyflakes' and removed all unused ones, repo (among
others) was never used.
http://www.selenic.com/repo/hg/rev/1ac628cd7113#l9.1
The previous code did an awful lot of work to infer the parents of an
exported commit, incorporating information from many sources. But
there were multiple bugs in this scheme, sometimes resulting in merge
commits with two parents pointing to the same commit object.
Instead, use a much more straightforward process of mapping the
parents stored in hg.
hg-fast-export uses hg's branch order (from the log) when merging,
this is a problem. Consider the case:
HG repo A has revisions 1-10. Repository B is cloned from that.
Subsequently, A adds revision 11, and B adds a different change which
also has revision 11. If B now pulls from A, A's rev11 will have the
number 12; if A then pulls from B, the reverse also holds. So the logs
are different even though they contain the exact same changes.
hg-fast-export will thus create different git repositories for A and B,
even though the contents are identical for all practical purposes.
In particular, the repos would be identical if A and B had used git from
the beginning.
To fix that, compare HG revisions instead of log positions.
Previously we fed the full revision only for the first one and deltas
for all following including branches being forked off. This doesn't work
with branches that are forked from revision 0. In case such a branch is
found, we now also feed the full revision.
Signed-off-by: Rocco Rutte <pdmef@gmx.net>
HG tag movement is now supported with this patch.
This patch creates a .git/hg2git-mapping file, which maps
HG revision numbers to HG hashes. Combined with the
.git/hg2git-marks file, which maps HG revisions to GIT hashes,
we can now reprocess all tags at the end of each hg export
operation.
Add -M, --default-branch <branch_name> to allow user to set
the default branch where to pull into
Signed-off-by: Fabrizio Chiarello <ponch@autistici.org>
In git-check-ref-format (1), there is the following rule for refnames:
3. It cannot have ASCII control character (i.e. bytes
whose values are lower than \040, or \177 DEL), space,
tilde ~, caret ^, colon :, question-mark ?, asterisk *,
or open bracket [ anywhere;
and indeed, this rule is enforced by "git fast-import". hg-fast-export
already checked for all of the visible characters listed except for ~
and converted them to underscores. For some reason the tilde was
forgotten. This patch makes good on the omission.
Note that control characters are still left alone.
Signed-off-by: Jonathan Nieder <jrnieder@uchicago.edu>
Signed-off-by: Rocco Rutte <pdmef@gmx.net>
At least the opensolaris hg repo has tag names containing '*',
so sanitize all branch and tag names roughly according to the
specs for git-check-ref-format(1).
Signed-off-by: Rocco Rutte <pdmef@gmx.net>
Merges were completely broken as they ended up with twice the same
parent in git. Still everything besides gitk seemed to work.
This because of 1) the incorrect assumption that a commit's parent is
the commit exported right before it and 2) some confusion with markes
being saved/loaded/used since git-fast-import doesn't allow for a ":0"
mark to map hg revision 1:1 to marks.
The merge "algorithm" now works like this:
1) If the commit's higher parent (highest hg rev), is not the last
commit exported for a particular branch, we issue a "from" command to
place it on top of it.
2) If the commit's lower parent exists, we issue a "merge" for it.
This is much simpler and seems to produce correct merges in git. And
while I'm at it, make output less confusing by prepending the target
branch name to each line.
The "twice the same parent" bug was discovered by Michele Ballabio on
the git list.
Signed-off-by: Rocco Rutte <pdmef@gmx.net>
Unfortunately, I can't do 'import hg-fast-export' from python itself, so
we need to move some common methods into 'hg2git.py' which is to be used
as a library for common hg->git routines.
Signed-off-by: Rocco Rutte <pdmef@gmx.net>