Before this change the search uses a single index which distinguishes types (repositories, users, etc.) with a field (_type).
But it has turned out that this could lead to problems, in particular if different types have the same field and uses different analyzers for those fields. The following links show even more problems of a combined index:
https://www.elastic.co/blog/index-vs-typehttps://www.elastic.co/guide/en/elasticsearch/reference/6.0/removal-of-types.html
With this change every type becomes its own index and the SearchEngine gets an api to modify multiple indices at once to remove all documents from all indices, which are related to a specific repository, for example.
The search uses another new api to coordinate the indexing, the central work queue.
The central work queue is able to coordinate long-running or resource intensive tasks. It is able to run tasks in parallel, but can also run tasks which targets the same resources in sequence. The queue is also persistent and can restore queued tasks after restart.
Co-authored-by: Konstantin Schaper <konstantin.schaper@cloudogu.com>
The original proxy configuration implementation only used the global configuration if the local proxy configuration was not provided (i.e. null). This PR adds the corner case where a local configuration is provided, but disabled. In this case, the global proxy configuration will be used as a fallback as well.
Reduce log level of `could not create token from authentication header from warn to debug, because it is normal that these message is logged if multiple SchemeBasedWebTokenGenerator are registered.
Fixes#1772
Apply proxy support for jGit by extracting the required functionality from the DefaultAdvancedHttpClient into its own class HttpURLConnectionFactory. This new class is now used by the DefaultAdvancedHttpClient and jGit.
The HttpURLConnection also fixes proxy server authentication, which was non functional in DefaultAdvancedHttpClient.
The proxy support for SVNKit is implemented by using the provided method of the BasicAuthenticationManager.
For mercurial the support is configured by writing the required settings to a temporary hgrc file.
* Introduce RepositoryCoordinates
RepositoryCoordinates will be used for the enrichment of the embedded repositories of search result hits. This is required, because if we used the normal repository for the enrichment, we would get a lot of unrelated enrichers would be applied.
* Add builder method to HalEnricherContext
With the new builder method it is possible to add an object to the context with an interface as key.
* Add enricher support for embedded repository by applying enricher for RepositoryCoordinates
* Use embedded repository for avatars
The Search api is now simpler, because it provides useful defaults. Only if you want to deviate from the defaults, you can set these values. This is mostly reached by using the builder pattern. Furthermore it is now possible to configure an analyzer per field. The default analyzer is still the one which is derived from the index options, but it is possible to configure a new indexer with the analyzer attribute of the indexed annotation. The attribute allows the configuration for code, identifiers and path. The current implementation uses the same analyzer code, identifiers and path. The new implemented splits tokens on more delimiters as the default analyzer e.g.: dots, underscores etc.
Co-authored-by: René Pfeuffer <rene.pfeuffer@cloudogu.com>
Expose an api which makes it easy to detect the content type of files. The api is based on the spotter api, but does not expose spotter classes.
Co-authored-by: René Pfeuffer <rene.pfeuffer@cloudogu.com>
Add a dedicated search page with more results and different types.
Users and groups are now indexed along with repositories.
Co-authored-by: René Pfeuffer <rene.pfeuffer@cloudogu.com>
Introduces a maximum size for the simple workdir cache. On cache overflow workdirs are evicted using an LRU strategy.
Furthermore parallel requests for the same repository will now block until the workdir is released.
The search link of the index resource is now an array of links instead of single templated link.
The array contains one link for each searchable type.
Co-authored-by: René Pfeuffer <rene.pfeuffer@cloudogu.com>
We introduced a new annotation '@IndexedType' which gets collected by the scm-annotation-processor. All classes which are annotated are index and searchable. This opens the search api for plugins.
Add a powerful search engine based on lucene to the scm-manager api.
The api can be used to index objects, simply by annotating them and add them to an index.
The first indexed object is the repository which could queried by quick search in the header.
Using a default user with a default password has the implicit risk, that this user is not changed and therefore this system can be compromised. With this change, SCM-Manager does not create the default user with the default password on startup any more, but it shows an initial form where the initial values for the administration user have to be entered by the user. To secure this form, a random token is created on startup and printed in the log.
To implement this form, the concept of an InitializationStep is introduced. This extension point can be implemented to offer different setup tasks. The creation of the administration user is the first implementation, others might be things like first plugin selections or the like.
Frontend components are selected by the name of these initialization steps, whose names will be added to the index resource
(whichever is active at the moment) and will be show accordingly.
Co-authored-by: Eduard Heimbuch <eduard.heimbuch@cloudogu.com>
Clear more caches if GPG key was added or deleted. It seems quite difficult to find the right way to invalidate partial caches so for now we keep purging everything.
Maybe we could add an API to efficiently find out what parts of the cache can be removed.
Fixes#1668
Add list of emergency contacts to global configuration. This user will receive e-mails and notification if some serious system error occurs like repository health check failed.
In the release of version 2.0.0 of SCM-Manager, the health checks had been neglected. This makes them visible again in the frontend and adds the ability to trigger them. In addition there are two types of health checks: The "normal" ones, now called "light checks", that are run on startup, and more intense checks run only on request.
As a change to version 1.x, health checks will no longer be persisted for repositories.
Co-authored-by: Eduard Heimbuch <eduard.heimbuch@cloudogu.com>
Validate filepath and filename to prevent path traversal in modification
command and provide validations for editor plugin.
Co-authored-by: René Pfeuffer <rene.pfeuffer@cloudogu.com>
Allow all UTF-8 characters except URL identifiers as user and group names and for namespaces.
Fixes#1513
Co-authored-by: René Pfeuffer <rene.pfeuffer@cloudogu.com>
Capture metrics about the lifetime of working copies used, for example, by the merge and modify commands. Working copies are internal repository clones that can place a large load on the server. Therefore, these metrics can be helpful in identifying sources of large server load.
Co-authored-by: Sebastian Sdorra <sebastian.sdorra@cloudogu.com>
Expose metrics about:
- User login attempts
- Failed user logins
- User logouts
- General successful accesses to SCM-Manager via any authentication realm
- General failed accesses to SCM-Manager
Co-authored-by: Sebastian Sdorra <sebastian.sdorra@cloudogu.com>
Collect guava cache statistics as metrics using micrometer. We replaced the own counter implementation of guava statistics with the guava internal caching statistics.
With this pull request, diffs for Git are loaded in chunks. This means, that for diffs with a lot of files only a part of them are loaded. In the UI a button will be displayed to load more. In the REST API, the number of files can be specified. This only works for diffs, that are delivered as "parsed" diffs. Currently, this is only available for Git.
Co-authored-by: Sebastian Sdorra <sebastian.sdorra@cloudogu.com>
Add search for files to the sources view. The search is only for finding file paths. It does not search any file metadata nor the content. Results get a rating, where file names are rated higher than file paths. The results are sorted by the score and the first 50 results are displayed.
Co-authored-by: Eduard Heimbuch <eduard.heimbuch@cloudogu.com>
Adds a protocol for repository imports (either from an URL, a dump file or a SCM-Manager repository archive).
This protocol documents single steps of an import, the time and the user and is accessible via a dedicated REST
endpoint or a simple ui.
The id of the log is added to the repository imported event, so that plugins like the landingpage or mail can link to these logs.
We will fire an RepositoryImportHookEvent instead of PostReceiveRepositoryHookEvent for repository imports with metadata. The event is only fired if all parts of the repository could be successfully imported. The extra event is required to avoid heavy recalculations which can be triggered by the PostReceiveRepositoryHookEvent for example the scm-statistic-plugin uses the PostReceiveRepositoryHookEvent to calculate its statistics.
Co-authored-by: Eduard Heimbuch <eduard.heimbuch@cloudogu.com>
Add option to encrypt repository exports with a password and add possibility to decrypt them on repository import. Also make the repository export asynchronous. This implies that the repository export will be created on the server and can be downloaded multiple times. The repository export will be deleted automatically 10 days after creation.
This change modifies the behaviour of the DefaultGroupCollector.
The collector does not longer resolve external groups for the anonymous user and it does not resolve internal nor external groups for the account which is used by the AdministrationContext.
This should reduce the requests which are send to external systems like ldap servers.
This adds a new migration mechanism for repository data. Instead of using UpdateSteps for all data migrations, repository data shall from now on be implemented with RepositoryUpdateSteps. The general logic stays the same. Executed updates are stored with the repository. Doing this, we can now execute updates on imported repositories without touching other data. This way we can import repositories even though they were exported with older versions of SCM-Manager or a plugin.