Fast-Export

mirror of https://github.com/frej/fast-export.git synced 2025-10-31 16:35:48 +01:00

Author	SHA1	Message	Date
chrisjbillington	13c273f10c	Resolve unicode escape sequences not being processed correctly In `process_unicode_escape_sequences()`, any backslash escape sequences in the original string are escaped upon the first `.encode('unicode-escape')` and therefore round-trip the sequence of `.encode('unicode-escape').decode('unicode-escape')`. That is not what we want - we want these sequences to be passed-through the `.encode` unchanged, so that they will be converted to the character they represent upon `.decode()`. This patch changes the `.encode()` step to pass through any ascii characters unchanged, only escaping non-ascii characters. This ensures any existing backslash escape sequences will be interpreted as the character they represent upon `.decode()`.	2022-10-23 11:51:33 +11:00
Frej Drejhammar	f179afce65	Fix FutureWarning about nested sets in re Since Python 3.7 the re module warns for syntax which could, in the future, be misparsed as a nested set. Avoid this by escaping the literal `[` we search for in the regexp. Reported by Monte Davidoff @mndavidoff Closes #269.	2022-02-09 15:37:29 +01:00
Frej Drejhammar	5b7ca5aaec	Give proper error message when refusing to overwrite existing branch If fast-export was asked to export a Mercurial branch to Git and a branch of the same name already existed in the Git repo but it was not created by fast export, fast-export would crash while trying to format an error message claiming that the destination branch was modified behind its back. This patch extends fast-export to detect the situation above and give a proper error message which hopefully is less confusing to the user. Credits for discovering the original crash goes to Shun-ichi Goto <gotoh@taiyo.co.jp>. Closes: #269.	2021-08-27 16:04:40 +02:00
Frej Drejhammar	bdfc0c08c7	Merge branch 'frej/issue-258' Closes 258	2021-02-26 16:44:31 +01:00
SirIntellegence	20c22a3110	Add plugin support for the 'extra' field Permits plugins to import other information such as svn conversion revisions	2021-02-22 13:09:48 -07:00
Frej Drejhammar	f741bf39f2	bugfix: Avoid starting incremental conversions from scratch Keys and values in the state cache are byte strings, therefore a lookup of 'tip' will always fail. The failure makes the conversion start over from the beginning, but as fast-export is deterministic the results are the same, just very inefficient. The bug has existed since the port to Python 3. This patch switches the 'tip' lookup to use a byte string which should make incremental conversions restart at the last converted commit. As 'x' == b'x' in Python 2, this should be a backwards compatible change. Bug reported and fix suggested by Tomas Kolda. Fixes #258.	2021-02-19 16:47:53 +01:00
Frej Drejhammar	7057ce2c2b	Allow plugins to modify the committer Plugins have since they were introduced been able to modify the author of a commit, but not the committer. This patch adds the necessary support for allowing them to also modify the committer.	2020-09-30 17:47:33 +02:00
Ondrej Stanek	9c6dea9fd4	Pass original hg commit hash to plugins	2020-07-31 10:50:51 +02:00
Ethan Furman	5c1cbf82b0	Add revision to commit_data for commit plugins Co-Authored-By: ostan89@gmail.com	2020-07-31 10:48:33 +02:00
Ondrej Stanek	50631c4b34	Add option --ignore-unnamed-heads This option allows the user to ignore only unnamed heads (compared to --force which ignores all non-fatal issues). The intended use is for a future plugin converting unnamed heads to named branches.	2020-07-31 10:30:53 +02:00
Ethan Furman	2a9dd53d14	Show all unnamed heads at once Co-Authored-By: ostan89@gmail.com	2020-07-31 10:27:07 +02:00
chrisjbillington	d29d30363b	Fix backward incompatible change for hg < 5.1 The port to Python 3 in `b961f146` changed `repo.branchmap().iteritems()` to use `.items()` instead. However, the object returned by mercurial isn't a dictionary and its `.items()` method was only introduced (as an alias for `iteritems`) in hg 5.1. `iteritems()` still exists, so let's keep using it for now to retain compatibility with hg < 5.1.	2020-05-06 11:59:49 -04:00
Frej Drejhammar	f102d2a69f	Merge branch 'PR/223' Closes #223	2020-05-06 16:31:13 +02:00
Ondrej Stanek	cf0e5837b6	Allow converting a repository with git and hg subrepos In the verification phase, fast-export falsely expects that both hg and git subrepositories should have the appropriate line in the subrepo-map file. The case is, that only hg subrepos need a line in subrepo-map that references a converted subrepo, while git subrepositories do not.	2020-05-06 16:30:05 +02:00
chrisjbillington	3b3f86b71e	Allow utf8 in mappings We were previously processing entries in mapping files (when `--mappings-are-raw` is not given) with `.decode('unicode_escape').encode('utf8')` to replace backslash escape sequences in bytestrings with the utf-8 encoded characters they represent. However, it turns out that `.decode ('unicode_escape')` assumes latin-1 encoding if it encounters non-ascii bytes: https://bugs.python.org/issue21331. So this gave incorrect results if non-ascii utf8 data was present in the mapping. To fix this, we now add an extra layer of `.decode('utf8').encode ('unicode-escape')` in order to convert any non-ascii characters into their backslash escape sequences. Then the subsequent `.decode('unicode_escape')` only encounters ascii characters and gives correct results.	2020-03-25 12:33:42 -04:00
chrisjbillington	6361b44c33	Fix bug in ignoring .git files/folders on Windows Mercurial internally stores (most) filepaths using forward slashes, and returns them as such from its Python API, even on Windows. So the splitting up of filepaths with `os.path.sep` was incorrect, resulting in `.git` files (those within a subdirectory, anyway) not being ignored on Windows as intended. Splitting on `b'/'` regardless of OS fixes this.	2020-03-08 19:40:50 +01:00
chrisjbillington	48508ee299	Fix failure to print error message in verify_heads On Python 3, `b'%s' % None` fails with a TypeError. In verify_heads, an error message prints the sha1 of a git commit, but that sha1 can be None. This commit instead prints `b'<None>'` if sha1 is None.	2020-03-06 11:02:38 -05:00
Max Fuqua	750fe6d3e1	Resolve type error resulting from passing an int to b'%s' in python3	2020-02-29 14:55:15 -05:00
chrisjbillington	4071f720b0	Fix issue #203 : Resolve stderr encoding issues In Python 3, `sys.stderr.write()` requires unicode strings, and all output on standard streams is UTF8 encoded. Therefore in the port to Python 3, we `.decode()`d all strings that are used in `%` formatting of strings to be printed to stderr. However, in Python 2, `sys.stderr` accepts either bytestrings or unicode strings, and: - `%s` formatting of a bytestring with a unicode string, i.e `"%s" % u"foo"` results in a unicode string. - Writing a unicode string to stderr/stdout uses that stream's encoding - When the output of the process is being piped somewhere other than a terminal (as it is when called with pipes and shell redirection from hg-fast-export.sh), that encoding is None, which implies ASCII. - This raises UnicodeEncodeError if the unicode strings passed to `stderr.write()` have non-ascii characters. We cannot fix this problem simply by encoding UTF8 again before writing to stderr on Python 2. This is because the decoding of filenames with the UTF8 codec may fail - filenames may not even be valid UTF8 desite this being the declared filesystem encoding. We could `fsdecode()` filenames on Python 3, which would use the `surrogateescape` error handler, but stderr does not use this error handler for output, meaning we would just have to encode again (with the same error handler) anyway. And Python 2 lacks the `surrogateescape` error handler in any case - we would need to reimplement it just to do a round-trip decode and encode for no reason. This commit leaves filenames and other repository data as bytestrings, and simply writes them to `sys.stderr.buffer` on Python 3 or `sys.stderr` on Python 2 as-is, after `%` formatting with bytestring literals. This avoids encoding issues of filenames altogether. Other writing to stderr that does not involve repository data has been left with "native" strings, i.e. `sys.stderr.write("a string literal %s" % a_command_line_arg)`. These will still fail on Python 3 if the user passes a non-UTF filename as a command line argument or similar. This is acceptable IMHO - although `hg-fast-export` may encounter invalid UTF8 in mercurial repositories, it is not too much to impose that the user name their branch mapping files etc with valid UTF8!	2020-02-19 12:18:00 -05:00
chrisjbillington	b961f146df	Support Python 3 Port hg-fast-import to Python 2/3 polyglot code. Since mercurial accepts and returns bytestrings for all repository data, the approach I've taken here is to use bytestrings throughout the hg-fast-import code. All strings pertaining to repository data are bytestrings. This means the code is using the same string datatype for this data on Python 3 as it did (and still does) on Python 2. Repository data coming from subprocess calls to git, or read from files, is also left as the bytestrings either returned from subprocess.check_output or as read from the file in 'rb' mode. Regexes and string literals that are used with repository data have all had a b'' prefix added. When repository data is used in error/warning messages, it is decoded with the UTF8 codec for printing. With this patch, hg-fast-export.py writes binary output to sys.stdout.buffer on Python 3 - on Python 2 this doesn't exist and it still uses sys.stdout. The only strings that are left as "native" strings and not coerced to bytestrings are filepaths passed in on the command line, and dictionary keys for internal data structures used by hg-fast-import.py, that do not originate in repository data. Mapping files are read in 'rb' mode, and thus bytestrings are read from them. When an encoding is given, their contents are decoded with that encoding, but then immediately encoded again with UTF8 and they are returned as the resulting bytestrings Other necessary changes were: - indexing byestrings with a single index returns an integer on Python. These indexing operations have been replaced with a one-element slice: x[0] -> x[0:1] or x[-1] -> [-1:] so at to return a bytestring. - raw_hash.encode('hex_codec') replaced with binascii.hexlify(raw_hash) - str(integer) -> b'%d' % integer - 'string_escape' codec replaced with 'unicode_escape' (which was backported to python 2.7). Strings decoded with this codec were then immediately re-encoded with UTF8. - Calls to map() intended to execute their contents immediately were unwrapped or converted to list comprehensions, since map() is an iterator and does not execute until iterated over. hg-fast-export.sh has been modified to not require Python 2. Instead, if PYTHON has not been defined, it checks python2, python, then python3, and uses the first one that exists and can import the mercurial module.	2020-02-13 14:35:19 -05:00
Frej Drejhammar	595587b245	Merge branch 'PR/197' Closes #197, #185, #196	2020-02-09 19:39:21 +01:00
Matthijs van der Burgh	0b6b83c3de	Adapt to status becoming an object in Mercurial 5.3 Status has always been a tuple, but since 5.3, commit: https://www.mercurial-scm.org/repo/hg/rev/c5548b0b6847, it is an object. Therefore the __getitem__ of the tuple isn't available anymore. This fix is compatible with mercurial>=4.6, as the old status tuple still has the same properties.	2020-02-08 17:23:30 +01:00
chrisjbillington	8d135fe700	Ignore files and directories called .git Git cannot track these files. Print a warning if encountering one. Fixes #166	2020-02-07 17:52:57 -05:00
MokhamedDakhraui	9c9669d361	Check .hgsub and .hgsubstate files to detect subrepo changes	2020-01-26 00:36:34 +03:00
Dave Townsend	ab31fdcbaa	Add support for git submodules Mercurial supports not only submodules which are Mercurial repositories, but also Git and Subversion repositories. This patch adds support for submodules which are Git repositories to hg-fast-export. As submodules which are Git repositories won't need a mapping file we trigger the submodule update only on the occurence of the `.hgsubstate` file and push the check for a valid `submodule_mappings` to `refresh_gitmodules(ctx)`	2019-12-07 10:22:23 -08:00
Dave Townsend	acf93a80a9	Only export submodules that exist in the submodule mapping.	2019-12-07 10:21:26 -08:00
Dave Townsend	0f49bfe0db	Move hg sub-module updating to its own function, NFC This refactoring is in preparation to supporting Mercurial submodules which are git repositories.	2019-12-07 09:39:43 -08:00
Dave Townsend	ff1c885305	Ignore obsolete changesets in the source repository Obsolete changesets are, for example, create by the Evolve extension. This patch switches to an unfiltered repository (the filtered one throws on an attempt to access obsolete revisions) and then filters out the obsolete revisions when it comes across them. Fixes #173	2019-10-20 19:45:42 +02:00
Frej Drejhammar	0096085b6f	Tag maps should use the same syntax as branch and author maps When version v171002 introduced a new mapping file format for branches and authors, that change never made it to the remapping of tags although the README documents it. Fixes #172.	2019-10-12 21:09:14 +02:00
Frej Drejhammar	1181a0af47	Allow name sanitizer to be disabled with --no-auto-sanitize Make it possible to completely disable the name sanitizer by the --no-auto-sanitize flag. Previously the sanitizer was run on user remapped names. As the sanitizer rewrites perfectly legal git names (such as __.*) this is probably not what the user wants. Closes #155.	2019-09-13 14:56:32 +02:00
MokhamedDakhraui	581b1b3d17	Remove git submodules if .hgsubstate file was removed or emptied	2019-08-18 05:46:46 +03:00
MokhamedDakhraui	7df01ac323	Refactor refresh_gitmodules() Use the change context substate field instead of manually parsing the `.hgsubstate` file.	2019-08-16 02:42:03 +03:00
MokhamedDakhraui	914f5a0dbe	Replaced several lambdas by one loop	2019-08-16 02:41:54 +03:00
MokhamedDakhraui	8779cb5e95	Extract operations with submodules to separated methods	2019-08-16 02:40:44 +03:00
Johannes Carlsson	47d330de83	Add support for mercurial subrepos This adds a new command line option (--subrepo-map) that will map mercurial subrepos to git submodules. The --subrepo-map takes a mapping file as an argument that will be used to map a subrepo folder to a git submodule. For more information see the README-SUBMODULES.md. This commit is inspired by the changes made by daolis in PR#38 that was never merged. Closes: #51 Closes: #147	2019-01-07 18:41:19 +01:00
Johan Henkens	cadcfcbe90	Move filter_contents to plugin system	2018-12-05 13:25:48 -08:00
Johan Henkens	e895ce087f	Add plugin system	2018-12-05 13:25:47 -08:00
Frej Drejhammar	ac60034ba3	Adhere to PEP 394 From PEP 394 [1]: * python2 will refer to some version of Python 2.x. * end users should be aware that python refers to python3 on at least Arch Linux (that change is what prompted the creation of this PEP), so python should be used in the shebang line only for scripts that are source compatible with both Python 2 and 3. So to make sure that we run correctly on a system where python refers to python3 and avoid problems like issue #11 we change the shebangs. [1] https://www.python.org/dev/peps/pep-0394/	2018-08-11 15:07:19 +02:00
Anton Tykhyy	89db1d93cf	Add --filter-contents	2018-06-17 21:09:59 +03:00
Frej Drejhammar	e200cec39f	Adapt to changes in Mercurial 4.6 Starting with Mercurial 4.6 repo.lookup() no longer accepts raw hashes for lookups.	2018-06-10 15:51:09 +02:00
Frej Drejhammar	50dc10770b	Warn contributors from doing work that will no be merged From time to time contributors spend time doing work that will not be accepted as it duplicates functionality that is already provided with the mapping files. Try to dissuade them from doing that by explaining the reasons in the comment.	2018-02-01 07:03:03 +01:00
Frej Drejhammar	cc8fefe008	Change syntax of mapping files This is done to allow escape sequences in the key and value strings.	2017-10-02 13:05:14 +02:00
Frej Drejhammar	e174c2a0b7	Refactor load_mapping() to move line parsing to inner function This is done in preparation to allowing mappings to contain quoted characters.	2017-09-29 18:50:41 +02:00
Frej Drejhammar	4bb50bb3fb	Fix crash when a branch name starts with '/' If a branch name starts with '/' it will be split into ['', ...] and then mapped over with dot(), only dot() does not handle the empty string. Teach dot() to handle the empty string. This fixes the underlying problem in issue #91.	2017-05-14 14:32:59 +02:00
Frej Drejhammar	c614ae776b	Fix "Branch ... modified outside hg-fast-export..." for sanitized branch names The heads cache contains sanitized names, but we try to look up unsanitized names, this is wrong. Switch to looking up the sanitized name.	2016-04-15 15:46:47 +02:00
Frej Drejhammar	7224e420a7	Warn if one of the marks, mapping, or heads files are empty	2016-04-03 15:48:03 +02:00
Frej Drejhammar	b7cc6ab3bf	verify_heads() needs to be aware of the branch renaming map As all branches created on the git side are transformed by sanitize_name(), this should be a safe backwards compatible change. If a user is doing incremental imports and sanitize_name() now suddenly modifies the branch name, verify_heads() would already have complained on the first incremental run. Thanks goes to Steve Tousignant<s.tousignant@gmail.com> for discovering the problem.	2016-04-02 15:01:45 +02:00
Frej Drejhammar	6d8b4dbb11	Warn if opening a mapping file fails	2016-04-02 14:59:47 +02:00
Frej Drejhammar	832ee29bfa	Refactor sanitize_name() to know about renaming map Handle the lookup table for branch and tag renaming inside sanitize_name().	2016-04-02 14:57:19 +02:00
Frej Drejhammar	46bf316a3c	Explain why it is a bad idea to change sanitize_name() This is a piece of code which frequently attracts pull requests which are summarily rejected. As there is no "git blame" for rejected pull requests, try to avoid misguided work by adding a comment at the relevant place.	2016-04-02 12:28:03 +02:00

1 2

95 Commits