Files
Fast-Export/hg-fast-export.sh

218 lines
6.6 KiB
Bash
Raw Normal View History

#!/bin/sh
# Copyright (c) 2007, 2008 Rocco Rutte <pdmef@gmx.net> and others.
# License: MIT <http://www.opensource.org/licenses/mit-license.php>
READLINK="readlink"
if command -v greadlink > /dev/null; then
READLINK="greadlink" # Prefer greadlink over readlink
fi
if ! $READLINK -f "$(which "$0")" > /dev/null 2>&1 ; then
ROOT="$(dirname "$(which "$0")")"
if [ ! -f "$ROOT/hg-fast-export.py" ] ; then
echo "hg-fast-exports requires a readlink implementation which knows" \
" how to canonicalize paths in order to be called via a symlink."
exit 1
fi
else
ROOT="$(dirname "$($READLINK -f "$(which "$0")")")"
fi
REPO=""
PFX="hg2git"
SFX_MAPPING="mapping"
SFX_MARKS="marks"
SFX_HEADS="heads"
SFX_STATE="state"
GFI_OPTS=""
if [ -z "${PYTHON}" ]; then
Support Python 3 Port hg-fast-import to Python 2/3 polyglot code. Since mercurial accepts and returns bytestrings for all repository data, the approach I've taken here is to use bytestrings throughout the hg-fast-import code. All strings pertaining to repository data are bytestrings. This means the code is using the same string datatype for this data on Python 3 as it did (and still does) on Python 2. Repository data coming from subprocess calls to git, or read from files, is also left as the bytestrings either returned from subprocess.check_output or as read from the file in 'rb' mode. Regexes and string literals that are used with repository data have all had a b'' prefix added. When repository data is used in error/warning messages, it is decoded with the UTF8 codec for printing. With this patch, hg-fast-export.py writes binary output to sys.stdout.buffer on Python 3 - on Python 2 this doesn't exist and it still uses sys.stdout. The only strings that are left as "native" strings and not coerced to bytestrings are filepaths passed in on the command line, and dictionary keys for internal data structures used by hg-fast-import.py, that do not originate in repository data. Mapping files are read in 'rb' mode, and thus bytestrings are read from them. When an encoding is given, their contents are decoded with that encoding, but then immediately encoded again with UTF8 and they are returned as the resulting bytestrings Other necessary changes were: - indexing byestrings with a single index returns an integer on Python. These indexing operations have been replaced with a one-element slice: x[0] -> x[0:1] or x[-1] -> [-1:] so at to return a bytestring. - raw_hash.encode('hex_codec') replaced with binascii.hexlify(raw_hash) - str(integer) -> b'%d' % integer - 'string_escape' codec replaced with 'unicode_escape' (which was backported to python 2.7). Strings decoded with this codec were then immediately re-encoded with UTF8. - Calls to map() intended to execute their contents immediately were unwrapped or converted to list comprehensions, since map() is an iterator and does not execute until iterated over. hg-fast-export.sh has been modified to not require Python 2. Instead, if PYTHON has not been defined, it checks python2, python, then python3, and uses the first one that exists and can import the mercurial module.
2020-02-10 21:39:13 -05:00
# $PYTHON is not set, so we try to find a working python with mercurial:
for python_cmd in python2 python python3; do
if command -v $python_cmd > /dev/null; then
$python_cmd -c 'from mercurial.scmutil import revsymbol' 2> /dev/null
Support Python 3 Port hg-fast-import to Python 2/3 polyglot code. Since mercurial accepts and returns bytestrings for all repository data, the approach I've taken here is to use bytestrings throughout the hg-fast-import code. All strings pertaining to repository data are bytestrings. This means the code is using the same string datatype for this data on Python 3 as it did (and still does) on Python 2. Repository data coming from subprocess calls to git, or read from files, is also left as the bytestrings either returned from subprocess.check_output or as read from the file in 'rb' mode. Regexes and string literals that are used with repository data have all had a b'' prefix added. When repository data is used in error/warning messages, it is decoded with the UTF8 codec for printing. With this patch, hg-fast-export.py writes binary output to sys.stdout.buffer on Python 3 - on Python 2 this doesn't exist and it still uses sys.stdout. The only strings that are left as "native" strings and not coerced to bytestrings are filepaths passed in on the command line, and dictionary keys for internal data structures used by hg-fast-import.py, that do not originate in repository data. Mapping files are read in 'rb' mode, and thus bytestrings are read from them. When an encoding is given, their contents are decoded with that encoding, but then immediately encoded again with UTF8 and they are returned as the resulting bytestrings Other necessary changes were: - indexing byestrings with a single index returns an integer on Python. These indexing operations have been replaced with a one-element slice: x[0] -> x[0:1] or x[-1] -> [-1:] so at to return a bytestring. - raw_hash.encode('hex_codec') replaced with binascii.hexlify(raw_hash) - str(integer) -> b'%d' % integer - 'string_escape' codec replaced with 'unicode_escape' (which was backported to python 2.7). Strings decoded with this codec were then immediately re-encoded with UTF8. - Calls to map() intended to execute their contents immediately were unwrapped or converted to list comprehensions, since map() is an iterator and does not execute until iterated over. hg-fast-export.sh has been modified to not require Python 2. Instead, if PYTHON has not been defined, it checks python2, python, then python3, and uses the first one that exists and can import the mercurial module.
2020-02-10 21:39:13 -05:00
if [ $? -eq 0 ]; then
PYTHON=$python_cmd
break
fi
fi
done
fi
Support Python 3 Port hg-fast-import to Python 2/3 polyglot code. Since mercurial accepts and returns bytestrings for all repository data, the approach I've taken here is to use bytestrings throughout the hg-fast-import code. All strings pertaining to repository data are bytestrings. This means the code is using the same string datatype for this data on Python 3 as it did (and still does) on Python 2. Repository data coming from subprocess calls to git, or read from files, is also left as the bytestrings either returned from subprocess.check_output or as read from the file in 'rb' mode. Regexes and string literals that are used with repository data have all had a b'' prefix added. When repository data is used in error/warning messages, it is decoded with the UTF8 codec for printing. With this patch, hg-fast-export.py writes binary output to sys.stdout.buffer on Python 3 - on Python 2 this doesn't exist and it still uses sys.stdout. The only strings that are left as "native" strings and not coerced to bytestrings are filepaths passed in on the command line, and dictionary keys for internal data structures used by hg-fast-import.py, that do not originate in repository data. Mapping files are read in 'rb' mode, and thus bytestrings are read from them. When an encoding is given, their contents are decoded with that encoding, but then immediately encoded again with UTF8 and they are returned as the resulting bytestrings Other necessary changes were: - indexing byestrings with a single index returns an integer on Python. These indexing operations have been replaced with a one-element slice: x[0] -> x[0:1] or x[-1] -> [-1:] so at to return a bytestring. - raw_hash.encode('hex_codec') replaced with binascii.hexlify(raw_hash) - str(integer) -> b'%d' % integer - 'string_escape' codec replaced with 'unicode_escape' (which was backported to python 2.7). Strings decoded with this codec were then immediately re-encoded with UTF8. - Calls to map() intended to execute their contents immediately were unwrapped or converted to list comprehensions, since map() is an iterator and does not execute until iterated over. hg-fast-export.sh has been modified to not require Python 2. Instead, if PYTHON has not been defined, it checks python2, python, then python3, and uses the first one that exists and can import the mercurial module.
2020-02-10 21:39:13 -05:00
if [ -z "${PYTHON}" ]; then
echo "Could not find a python interpreter with the mercurial module >= 4.6 available. " \
Support Python 3 Port hg-fast-import to Python 2/3 polyglot code. Since mercurial accepts and returns bytestrings for all repository data, the approach I've taken here is to use bytestrings throughout the hg-fast-import code. All strings pertaining to repository data are bytestrings. This means the code is using the same string datatype for this data on Python 3 as it did (and still does) on Python 2. Repository data coming from subprocess calls to git, or read from files, is also left as the bytestrings either returned from subprocess.check_output or as read from the file in 'rb' mode. Regexes and string literals that are used with repository data have all had a b'' prefix added. When repository data is used in error/warning messages, it is decoded with the UTF8 codec for printing. With this patch, hg-fast-export.py writes binary output to sys.stdout.buffer on Python 3 - on Python 2 this doesn't exist and it still uses sys.stdout. The only strings that are left as "native" strings and not coerced to bytestrings are filepaths passed in on the command line, and dictionary keys for internal data structures used by hg-fast-import.py, that do not originate in repository data. Mapping files are read in 'rb' mode, and thus bytestrings are read from them. When an encoding is given, their contents are decoded with that encoding, but then immediately encoded again with UTF8 and they are returned as the resulting bytestrings Other necessary changes were: - indexing byestrings with a single index returns an integer on Python. These indexing operations have been replaced with a one-element slice: x[0] -> x[0:1] or x[-1] -> [-1:] so at to return a bytestring. - raw_hash.encode('hex_codec') replaced with binascii.hexlify(raw_hash) - str(integer) -> b'%d' % integer - 'string_escape' codec replaced with 'unicode_escape' (which was backported to python 2.7). Strings decoded with this codec were then immediately re-encoded with UTF8. - Calls to map() intended to execute their contents immediately were unwrapped or converted to list comprehensions, since map() is an iterator and does not execute until iterated over. hg-fast-export.sh has been modified to not require Python 2. Instead, if PYTHON has not been defined, it checks python2, python, then python3, and uses the first one that exists and can import the mercurial module.
2020-02-10 21:39:13 -05:00
"Please use the 'PYTHON' environment variable to specify the interpreter to use."
exit 1
fi
USAGE="[--quiet] [-r <repo>] [--force] [-m <max>] [-s] [--hgtags] [-A <file>] [-B <file>] [-T <file>] [-M <name>] [-o <name>] [--hg-hash] [-e <encoding>]"
LONG_USAGE="Import hg repository <repo> up to either tip or <max>
If <repo> is omitted, use last hg repository as obtained from state file,
GIT_DIR/$PFX-$SFX_STATE by default.
Note: The argument order matters.
Options:
--quiet Passed to git-fast-import(1)
-r <repo> Mercurial repository to import
--force Ignore validation errors when converting, and pass --force
to git-fast-import(1)
-m <max> Maximum revision to import
-s Enable parsing Signed-off-by lines
--hgtags Enable exporting .hgtags files
-A <file> Read author map from file
(Same as in git-svnimport(1) and git-cvsimport(1))
-B <file> Read branch map from file
-T <file> Read tags map from file
-M <name> Set the default branch name (defaults to 'master')
-n Do not perform built-in (broken in many cases) sanitizing
of branch/tag names.
-o <name> Use <name> as branch namespace to track upstream (eg 'origin')
--hg-hash Annotate commits with the hg hash as git notes in the
hg namespace.
-e <encoding> Assume commit and author strings retrieved from
Mercurial are encoded in <encoding>
--fe <filename_encoding> Assume filenames from Mercurial are encoded
in <filename_encoding>
--mappings-are-raw Assume mappings are raw <key>=<value> lines
2018-06-17 21:09:59 +03:00
--filter-contents <cmd> Pipe contents of each exported file through <cmd>
with <file-path> <hg-hash> <is-binary> as arguments
2018-12-05 09:23:35 -08:00
--plugin <plugin=init> Add a plugin with the given init string (repeatable)
--plugin-path <plugin-path> Add an additional plugin lookup path
"
case "$1" in
-h|--help)
echo "usage: $(basename "$0") $USAGE"
echo ""
echo "$LONG_USAGE"
exit 0
esac
IS_BARE=$(git rev-parse --is-bare-repository) \
|| (echo "Could not find git repo" ; exit 1)
if test "z$IS_BARE" != ztrue; then
# This is not a bare repo, cd to the toplevel
TOPLEVEL=$(git rev-parse --show-toplevel) \
|| (echo "Could not find git repo toplevel" ; exit 1)
cd "$TOPLEVEL" || exit 1
fi
GIT_DIR=$(git rev-parse --git-dir) || (echo "Could not find git repo" ; exit 1)
IGNORECASEWARN=""
IGNORECASE=`git config core.ignoreCase`
if [ "true" = "$IGNORECASE" ]; then
IGNORECASEWARN="true"
fi;
while case "$#" in 0) break ;; esac
do
case "$1" in
-r|--r|--re|--rep|--repo)
shift
REPO="$1"
;;
--q|--qu|--qui|--quie|--quiet)
GFI_OPTS="$GFI_OPTS --quiet"
;;
--force)
# pass --force to git-fast-import and hg-fast-export.py
GFI_OPTS="$GFI_OPTS --force"
IGNORECASEWARN="";
break
;;
-*)
# pass any other options down to hg2git.py
break
;;
*)
break
;;
esac
shift
done
if [ ! -z "$IGNORECASEWARN" ]; then
echo "Error: The option core.ignoreCase is set to true in the git"
echo "repository. This will produce empty changesets for renames that just"
echo "change the case of the file name."
echo "Use --force to skip this check or change the option with"
echo "git config core.ignoreCase false"
exit 1
fi;
# Make a backup copy of each state file
for i in $SFX_STATE $SFX_MARKS $SFX_MAPPING $SFX_HEADS ; do
if [ -f "$GIT_DIR/$PFX-$i" ] ; then
cp "$GIT_DIR/$PFX-$i" "$GIT_DIR/$PFX-$i~"
fi
done
# for convenience: get default repo from state file
if [ x"$REPO" = x -a -f "$GIT_DIR/$PFX-$SFX_STATE" ] ; then
REPO="`grep '^:repo ' "$GIT_DIR/$PFX-$SFX_STATE" | cut -d ' ' -f 2`"
echo "Using last hg repository \"$REPO\""
fi
if [ -z "$REPO" ]; then
echo "no repo given, use -r flag"
exit 1
fi
# make sure we have a marks cache
if [ ! -f "$GIT_DIR/$PFX-$SFX_MARKS" ] ; then
touch "$GIT_DIR/$PFX-$SFX_MARKS"
fi
# cleanup on exit
trap 'rm -f "$GIT_DIR/$PFX-$SFX_MARKS.old" "$GIT_DIR/$PFX-$SFX_MARKS.tmp"' 0
_err1=
_err2=
exec 3>&1
{ read -r _err1 || :; read -r _err2 || :; } <<-EOT
$(
exec 4>&3 3>&1 1>&4 4>&-
{
_e1=0
GIT_DIR="$GIT_DIR" "$PYTHON" "$ROOT/hg-fast-export.py" \
--repo "$REPO" \
--marks "$GIT_DIR/$PFX-$SFX_MARKS" \
--mapping "$GIT_DIR/$PFX-$SFX_MAPPING" \
--heads "$GIT_DIR/$PFX-$SFX_HEADS" \
--status "$GIT_DIR/$PFX-$SFX_STATE" \
"$@" 3>&- || _e1=$?
echo $_e1 >&3
} | \
{
_e2=0
git fast-import $GFI_OPTS --export-marks="$GIT_DIR/$PFX-$SFX_MARKS.tmp" 3>&- || _e2=$?
echo $_e2 >&3
}
)
EOT
exec 3>&-
[ "$_err1" = 0 -a "$_err2" = 0 ] || exit 1
# move recent marks cache out of the way...
if [ -f "$GIT_DIR/$PFX-$SFX_MARKS" ] ; then
mv "$GIT_DIR/$PFX-$SFX_MARKS" "$GIT_DIR/$PFX-$SFX_MARKS.old"
else
touch "$GIT_DIR/$PFX-$SFX_MARKS.old"
fi
# ...to create a new merged one
cat "$GIT_DIR/$PFX-$SFX_MARKS.old" "$GIT_DIR/$PFX-$SFX_MARKS.tmp" \
| uniq > "$GIT_DIR/$PFX-$SFX_MARKS"
# save SHA1s of current heads for incremental imports
# and connectivity (plus sanity checking)
for head in `git branch | sed 's#^..##'` ; do
id="`git rev-parse refs/heads/$head`"
echo ":$head $id"
done > "$GIT_DIR/$PFX-$SFX_HEADS"
# check diff with color:
# ( for i in `find . -type f | grep -v '\.git'` ; do diff -u $i $REPO/$i ; done | cdiff ) | less -r