Fast-Export

mirror of https://github.com/frej/fast-export.git synced 2025-11-01 17:05:46 +01:00

Author	SHA1	Message	Date
Günther Nußmüller	d77765a23e	Fix UnboundLocalError with plugins and largefiles When Plugins are used in a repository that contains largefiles, the following exception is thrown as soon as the first largefile is converted: ``` Traceback (most recent call last): File "fast-export/hg-fast-export.py", line 728, in <module> sys.exit(hg2git(options.repourl,m,options.marksfile,options.mappingfile, File "fast-export/hg-fast-export.py", line 581, in hg2git c=export_commit(ui,repo,rev,old_marks,max,c,authors,branchesmap, File "fast-export/hg-fast-export.py", line 366, in export_commit export_file_contents(ctx,man,modified,hgtags,fn_encoding,plugins) File "fast-export/hg-fast-export.py", line 222, in export_file_contents file_data = {'filename':filename,'file_ctx':file_ctx,'data':d} UnboundLocalError: local variable 'file_ctx' referenced before assignment ``` This commit fixes the error by: * initializing the file_ctx before the largefile handling takes place * Providing a new `is_largefile` value for plugins so they can detect if largefile handling was applied (and therefore the file_ctx object may no longer be in sync with the git version of the file)	2025-08-11 08:30:17 +02:00
Frej Drejhammar	f71385ec14	Fix "Warn if one of the marks, mapping, or heads files are empty" The commit "Warn if one of the marks, mapping, or heads files are empty" (`7224e420a7`) mixed up the state and heads caches and reported that the heads cache was empty if the state case was. Error found by Shun-ichi Goto. Closes #338	2025-06-05 16:50:56 +02:00
Frank Zingsheim	bd707b5d6e	Fix: Largefiles ignored #141 Import mercurial large files as ordinary files into git The basic idea to this fix is based on https://github.com/planestraveler/fast-export/tree/add-lfs-support-v2 from PR #65 Closes #141	2025-03-29 18:39:27 +01:00
Thalia Archibald	f947189dcc	Consistently terminate commit messages with LF When the length logic for fast-import 'data' commands was updated in `4c10270` (Fix data handling, 2023-03-02), one branch was missed, so commit messages now do not have a final LF appended in most cases. This changed the longtime behavior, which had been consistent since the first commit of hg2git, `9832035` (Initial import, 2007-03-06), and is expected by some applications which compare against old conversions from Mercurial.	2024-07-05 05:20:35 -07:00
Frej Drejhammar	fb225c4700	Merge branch 'gh/321'	2024-02-23 17:07:02 +01:00
Stephan Hohe	e63feee1b9	Don't add file if plugin sets content to `None`	2024-02-20 17:07:23 +01:00
Stephan Hohe	7b4bb7ff1d	Fix escape in regular expression	2024-02-19 23:40:05 +01:00
Frej Drejhammar	ddfc3a8300	Run file_data_filter on deleted files The `file_data_filter` method should be called when files are deleted. In this case the `data` and `file_ctx` keys map to None. This is so that a filter which modifies file names can apply the same name transformations before files are deleted.	2024-02-16 17:12:49 +01:00
Ekin Dursun	c49dd0cf60	Remove Python 2 compatibility code Python 2 support was removed recently, so we don't need the compatibility code anymore.	2023-11-18 20:22:18 +03:00
Felipe Contreras	9754a9f3f6	Trivial simplification Just return the values directly, no need to store them into variables. Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>	2023-03-14 22:12:50 -06:00
Felipe Contreras	d2f11bd619	Remove multiple parent logic for file changes This is already what repo.status does. Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>	2023-03-14 22:12:50 -06:00
Felipe Contreras	3582221efd	Compare changes only with the first parent It's not necessary to check both parents. Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>	2023-03-14 22:12:50 -06:00
Felipe Contreras	0ae0d20496	Remove no-op check This code is only executed when there's two parents. Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>	2023-03-14 22:12:50 -06:00
Felipe Contreras	e09a14a266	Move parents logic inside get_filechanges This way export_commit is much simpler (already quite complex), and it's easier to modify the logic. No functional changes. Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>	2023-03-14 22:12:50 -06:00
Felipe Contreras	9df2f97f6c	Rename variables in get_filechanges It's easier to understand this way. No functional changes. Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>	2023-03-14 22:12:50 -06:00
Felipe Contreras	531fa9b3a2	Simplify split_dict There's no need to keep track of the left side: if it's modified it's modified. Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>	2023-03-14 22:12:50 -06:00
Felipe Contreras	a229b39d66	Coalesce modified files Git doesn't care if they are added or changed: they are modified. Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>	2023-03-14 22:12:50 -06:00
Felipe Contreras	c666fd9c95	Trivial style cleanup Checking the array directly is more idiomatic. Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>	2023-03-14 22:12:50 -06:00
Felipe Contreras	21fa443b4a	Simplify list of files for the first commit We already have the files. Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>	2023-03-14 22:12:50 -06:00
Felipe Contreras	6fbe4d0ad0	Skip earlier Now that we have ctx easily available, skip early. Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>	2023-03-10 12:38:42 -06:00
Felipe Contreras	fa73d8dec9	Share the changectx more It's used everywhere, might as well pass it along. Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>	2023-03-10 12:38:30 -06:00
Felipe Contreras	e1e15b2091	Avoid revsymbol() We can just do repo[rev]. Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>	2023-03-09 19:48:44 -06:00
Felipe Contreras	534d2bdd92	Don't deal with the node in get_changeset() It's not necessary. It could be fetched with repo[rev].node(), but why bother? Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>	2023-03-09 19:48:44 -06:00
Felipe Contreras	23f41c0ff1	Use revision directly instead of revnode We don't need the revnode. Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>	2023-03-09 19:48:44 -06:00
Felipe Contreras	8b1fd408ca	Use changectx directly There's no need to call repo[revnode] when repo[rev] works perfectly fine. And since we have the context already we can just do ctx.hex() instead of hexlifying ourselves. Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>	2023-03-09 19:48:44 -06:00
Felipe Contreras	4a4d242e98	Fetch node directly No need to call get_changeset() for that. Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>	2023-03-09 19:48:44 -06:00
Felipe Contreras	432254100b	Fetch branch names directly No need to use get_changeset() for just one thing. Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>	2023-03-09 19:48:44 -06:00
Felipe Contreras	5e4bc6eb03	Remove cruft Nothing uses that variable. Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>	2023-03-09 19:48:44 -06:00
Felipe Contreras	bbab981130	Trivial simplification of wr No need to issue two write commands. Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>	2023-03-04 16:08:45 +01:00
Felipe Contreras	c3cbf1e04d	Add wr_data helper No functional changes. Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>	2023-03-03 19:34:29 -06:00
Felipe Contreras	4c10270302	Fix data handling The length should be exactly the same as the data, for example if the data is "hello" only 5 characters should be written on the stream. Thus it should always be `len(data)`, not `len(data)+1` as it currently is in some places. Since the first commit of hg2git.py there was a wtf comment, presumably Rocco was confused about this common discrepancy. We can shuffle the logic around by adding '\n' to the data, and removing +1 to the length. Also, the data should be written without a newline (wr_no_nl). Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>	2023-03-03 19:33:45 -06:00
chrisjbillington	13c273f10c	Resolve unicode escape sequences not being processed correctly In `process_unicode_escape_sequences()`, any backslash escape sequences in the original string are escaped upon the first `.encode('unicode-escape')` and therefore round-trip the sequence of `.encode('unicode-escape').decode('unicode-escape')`. That is not what we want - we want these sequences to be passed-through the `.encode` unchanged, so that they will be converted to the character they represent upon `.decode()`. This patch changes the `.encode()` step to pass through any ascii characters unchanged, only escaping non-ascii characters. This ensures any existing backslash escape sequences will be interpreted as the character they represent upon `.decode()`.	2022-10-23 11:51:33 +11:00
Frej Drejhammar	f179afce65	Fix FutureWarning about nested sets in re Since Python 3.7 the re module warns for syntax which could, in the future, be misparsed as a nested set. Avoid this by escaping the literal `[` we search for in the regexp. Reported by Monte Davidoff @mndavidoff Closes #269.	2022-02-09 15:37:29 +01:00
Frej Drejhammar	5b7ca5aaec	Give proper error message when refusing to overwrite existing branch If fast-export was asked to export a Mercurial branch to Git and a branch of the same name already existed in the Git repo but it was not created by fast export, fast-export would crash while trying to format an error message claiming that the destination branch was modified behind its back. This patch extends fast-export to detect the situation above and give a proper error message which hopefully is less confusing to the user. Credits for discovering the original crash goes to Shun-ichi Goto <gotoh@taiyo.co.jp>. Closes: #269.	2021-08-27 16:04:40 +02:00
Frej Drejhammar	bdfc0c08c7	Merge branch 'frej/issue-258' Closes 258	2021-02-26 16:44:31 +01:00
SirIntellegence	20c22a3110	Add plugin support for the 'extra' field Permits plugins to import other information such as svn conversion revisions	2021-02-22 13:09:48 -07:00
Frej Drejhammar	f741bf39f2	bugfix: Avoid starting incremental conversions from scratch Keys and values in the state cache are byte strings, therefore a lookup of 'tip' will always fail. The failure makes the conversion start over from the beginning, but as fast-export is deterministic the results are the same, just very inefficient. The bug has existed since the port to Python 3. This patch switches the 'tip' lookup to use a byte string which should make incremental conversions restart at the last converted commit. As 'x' == b'x' in Python 2, this should be a backwards compatible change. Bug reported and fix suggested by Tomas Kolda. Fixes #258.	2021-02-19 16:47:53 +01:00
Frej Drejhammar	7057ce2c2b	Allow plugins to modify the committer Plugins have since they were introduced been able to modify the author of a commit, but not the committer. This patch adds the necessary support for allowing them to also modify the committer.	2020-09-30 17:47:33 +02:00
Ondrej Stanek	9c6dea9fd4	Pass original hg commit hash to plugins	2020-07-31 10:50:51 +02:00
Ethan Furman	5c1cbf82b0	Add revision to commit_data for commit plugins Co-Authored-By: ostan89@gmail.com	2020-07-31 10:48:33 +02:00
Ondrej Stanek	50631c4b34	Add option --ignore-unnamed-heads This option allows the user to ignore only unnamed heads (compared to --force which ignores all non-fatal issues). The intended use is for a future plugin converting unnamed heads to named branches.	2020-07-31 10:30:53 +02:00
Ethan Furman	2a9dd53d14	Show all unnamed heads at once Co-Authored-By: ostan89@gmail.com	2020-07-31 10:27:07 +02:00
chrisjbillington	d29d30363b	Fix backward incompatible change for hg < 5.1 The port to Python 3 in `b961f146` changed `repo.branchmap().iteritems()` to use `.items()` instead. However, the object returned by mercurial isn't a dictionary and its `.items()` method was only introduced (as an alias for `iteritems`) in hg 5.1. `iteritems()` still exists, so let's keep using it for now to retain compatibility with hg < 5.1.	2020-05-06 11:59:49 -04:00
Frej Drejhammar	f102d2a69f	Merge branch 'PR/223' Closes #223	2020-05-06 16:31:13 +02:00
Ondrej Stanek	cf0e5837b6	Allow converting a repository with git and hg subrepos In the verification phase, fast-export falsely expects that both hg and git subrepositories should have the appropriate line in the subrepo-map file. The case is, that only hg subrepos need a line in subrepo-map that references a converted subrepo, while git subrepositories do not.	2020-05-06 16:30:05 +02:00
chrisjbillington	3b3f86b71e	Allow utf8 in mappings We were previously processing entries in mapping files (when `--mappings-are-raw` is not given) with `.decode('unicode_escape').encode('utf8')` to replace backslash escape sequences in bytestrings with the utf-8 encoded characters they represent. However, it turns out that `.decode ('unicode_escape')` assumes latin-1 encoding if it encounters non-ascii bytes: https://bugs.python.org/issue21331. So this gave incorrect results if non-ascii utf8 data was present in the mapping. To fix this, we now add an extra layer of `.decode('utf8').encode ('unicode-escape')` in order to convert any non-ascii characters into their backslash escape sequences. Then the subsequent `.decode('unicode_escape')` only encounters ascii characters and gives correct results.	2020-03-25 12:33:42 -04:00
chrisjbillington	6361b44c33	Fix bug in ignoring .git files/folders on Windows Mercurial internally stores (most) filepaths using forward slashes, and returns them as such from its Python API, even on Windows. So the splitting up of filepaths with `os.path.sep` was incorrect, resulting in `.git` files (those within a subdirectory, anyway) not being ignored on Windows as intended. Splitting on `b'/'` regardless of OS fixes this.	2020-03-08 19:40:50 +01:00
chrisjbillington	48508ee299	Fix failure to print error message in verify_heads On Python 3, `b'%s' % None` fails with a TypeError. In verify_heads, an error message prints the sha1 of a git commit, but that sha1 can be None. This commit instead prints `b'<None>'` if sha1 is None.	2020-03-06 11:02:38 -05:00
Max Fuqua	750fe6d3e1	Resolve type error resulting from passing an int to b'%s' in python3	2020-02-29 14:55:15 -05:00
chrisjbillington	4071f720b0	Fix issue #203 : Resolve stderr encoding issues In Python 3, `sys.stderr.write()` requires unicode strings, and all output on standard streams is UTF8 encoded. Therefore in the port to Python 3, we `.decode()`d all strings that are used in `%` formatting of strings to be printed to stderr. However, in Python 2, `sys.stderr` accepts either bytestrings or unicode strings, and: - `%s` formatting of a bytestring with a unicode string, i.e `"%s" % u"foo"` results in a unicode string. - Writing a unicode string to stderr/stdout uses that stream's encoding - When the output of the process is being piped somewhere other than a terminal (as it is when called with pipes and shell redirection from hg-fast-export.sh), that encoding is None, which implies ASCII. - This raises UnicodeEncodeError if the unicode strings passed to `stderr.write()` have non-ascii characters. We cannot fix this problem simply by encoding UTF8 again before writing to stderr on Python 2. This is because the decoding of filenames with the UTF8 codec may fail - filenames may not even be valid UTF8 desite this being the declared filesystem encoding. We could `fsdecode()` filenames on Python 3, which would use the `surrogateescape` error handler, but stderr does not use this error handler for output, meaning we would just have to encode again (with the same error handler) anyway. And Python 2 lacks the `surrogateescape` error handler in any case - we would need to reimplement it just to do a round-trip decode and encode for no reason. This commit leaves filenames and other repository data as bytestrings, and simply writes them to `sys.stderr.buffer` on Python 3 or `sys.stderr` on Python 2 as-is, after `%` formatting with bytestring literals. This avoids encoding issues of filenames altogether. Other writing to stderr that does not involve repository data has been left with "native" strings, i.e. `sys.stderr.write("a string literal %s" % a_command_line_arg)`. These will still fail on Python 3 if the user passes a non-UTF filename as a command line argument or similar. This is acceptable IMHO - although `hg-fast-export` may encounter invalid UTF8 in mercurial repositories, it is not too much to impose that the user name their branch mapping files etc with valid UTF8!	2020-02-19 12:18:00 -05:00

1 2 3

126 Commits