Sidra Data Platform | Version 2022.R2: Kind Kanzi
- 1 What’s new in Sidra 2022.R2
- 1.1 .NET migration
- 1.2 Windows-2022 release pipelines
- 1.3 Schema evolution for added columns and tables in source
- 1.4 Updated Databricks cluster configuration to 10.4 LTS
- 1.5 Support multiple Client Application pipelines per Entity
- 1.6 Client Application improvements
- 1.7 Inclusion of Entity information in Client Apps
- 1.8 Plugin release management improvements
- 1.9 File indexing artifacts deletion and re-indexing automated process
- 1.10 Allow SMTP settings different from Sendgrid in Sidra installation
- 1.11 Knowledge Store ingestion performance improvements
- 1.12 Documentation improvements
- 2 Issues fixed in Sidra 2021.R2
- 3 Breaking Changes in Sidra 2022.R2
- 4 Coming soon…
- 5 Feedback
The main focus of this release has been the full migration of .NET Core 3.2 to .Net 6 in Sidra.
However, this release also includes important functional improvements. The most significant of them is the support for schema evolution in source systems of type database. While previous versions of Sidra already detected changes to the data source table schemas and notified the user of these changes, this release brings the support for fully automated schema amendment on the DSU table, adding new Attributes and Entities when they appear in the data source.
Knowledge Store ingestion performance and operational improvements have also been included here to accomplish different scenarios derived from the binary file ingestion, bringing the knowledge store to a fully supported scenario now.
Besides all these changes, some other technical evolution topics have been included in this release, including the update of Databricks cluster runtime to the latest long term support (LTS) version 10.4, and the migration of all the CI/CD Azure DevOps release pipelines to support the current windows-2022 agent.
Sidra also continues to consolidate the operational improvements around Client Applications deployment and plugin management.
What’s new in Sidra 2022.R2
One of the biggest changes included in this release has been the full migration of .NET version from 3.2 to .NET version 6. This was a critical and much needed effort, since Sidra was using .Net Core 3.2, which was going to go out EOL by the end of 2022. By moving to .Net 6 we are not just reaping the benefits of the newer platform, but with this new version being LTS, we are ensuring supportability until November 2024. This migration work has impacted all Core, DSU and Client Apps modules, including:
- Sidra plugins
- Sidra API
- Sidra Client Applications
- Identity Server
- Balea authorization framework
- Other Sidra backend modules, e.g backend jobs, backoffice and deployment tools and modules
Windows-2022 release pipelines
The Agent Pool in Sidra Azure DevOps Release pipeline has been upgraded to windows-2022, affecting both Core deployment and Client Apps pipelines. You can check more information about this technology evolution in this link.
Schema evolution for added columns and tables in source
In this release Sidra incorporates a new feature intended to automatically evolve the schema of the source tables, meaning that, whenever new columns of data are added, new Attributes will be created in Sidra Core metadata for each new column added recently.
Following the Sidra process, new columns will be created in their respective Databricks tables for that Entity. Then, once in each execution of the data extraction pipeline, Assets will be created including these new columns in Databricks. Also, whenever new tables are added in the source system, if these tables are configured to be included in the data extraction (in the metadata extraction options of the Data Intake Process configuration), new Entities will be created for these added tables and associated to the data extraction pipeline, so they can be included in the data extraction set.
For more details, you can see the related schema evolution documentation.
Updated Databricks cluster configuration to 10.4 LTS
The runtime version of Databricks cluster for DSU resource groups has been upgraded to the latest long term stable (LTS) version that Databrick has released. You can check this information in Azure Databricks documentation.
Support multiple Client Application pipelines per Entity
A more robust mechanism to manage multiple Client Application pipelines per Entity has been included.
This has been done through a couple of improvements on the Client Applications side:
- Sidra now supports a new PipelineSyncBehavior (Sync webjob that synchronizes metadata between the DSU and the Client Application). This behaviour loads any Asset before up the most recent day in which there are available Assets. The Asset status is tracked via the table ExtractPipelineExecution, to work with and track status of multiple pipelines per Entity. This is the mandatory Sync Behavior (PipelineSyncBehavior) for loading an Entity with several pipelines. This is also the recommended SyncBehavior for all new load pipelines.
- All the Client Application pipeline templates have been updated to keep track and update this Client pipeline execution status. You can check this information in Sidra Client Application concepts and Sidra Client Application pipelines documentation pages.
Client Application improvements
This feature consists of a few enhancements on the Client Application side:
- A condition to the orchestrator activity in the Client Application load pipelines has been introduced, to manage when the stored procedure is empty, or creates a new template without the orchestrator.
- All sidra-owned Client Application pipeline templates have been reviewed from the security standpoint, to ensure all sensitive settings are handled by the KeyVault resource inside the Client Application.
- A new Consolidation mode overwrite for overwriting data for Client Applications has been added. To enable this mode, a new Client Application pipeline parameter to signal this mode has been added, called PipelineExecutionProperties. In the default consolidation mode merge, the data will be merged if there is a Primary Key, otherwise data will be appended. If the consolidation mode is overwrite, the entire table will be always overwritten.
More information about this feature can be checked in the documentation site.
Inclusion of Entity information in Client Apps
For this new version of Sidra, information about Client Applications from the Asset table in Sidra Core has been included such as row count (Entities), errors (Validation Errors) and byte sizes (ByteSize), in order to keep a detailed tracking of the whole process from the Client Application side. The addition of these controlled fields allows a configuration advanced logic for use cases on the Client Application side.
Plugin release management improvements
We have refactored the plugins code to move each plugin to its own release branch independent of Sidra release and other plugins release. Additionally, we have added a new registration endpoint for Sidra plugins, to ease the compatibility set up between each new plugin version and the Sidra version.
File indexing artifacts deletion and re-indexing automated process
This feature complements the current capability of Knowledge Store ingestion in Sidra, to support scenarios where we need a semi-automated process to remove artifacts from the index, or prepare the environment for a re-indexing of the documents, avoiding the previously ingested documents from affecting the index. This scenario could happen whenever there is a change in the index structure, an update of the skill model, etc.
A new Core API endpoint has been developed, that orchestrates the different actions required to prepare all the moving pieces of Sidra binary file ingestion to remove old documents from documents index, and intermediate and final data ingestion artifacts (raw storage, index, Knowledge Store, Databricks tables).
This endpoint supports two basic modes. In both modes, the intermediate structures and Asset state for the given Assets ingested by a specific pipeline are removed from Core. The set of these artifacts that are always removed are the following:
- Azure Search datasources
- Knowledge Store files
- Databricks tables and ingested data
With regards to what to do with the raw documents in the intermediate raw storage container, two different modes have been defined for this endpoint:
- Mode Delete: Moves all Asset binary files from all Entities associated to the Azure Search pipeline to the backup container, and set the Asset indexing state to Archived.
- Mode PrepareToReindex: Moves all Asset binary files from the raw container of each of the Entities associated with the pipeline to the /indexlanding container. In this case, in the next execution of the indexer job, it will re-register the Assets and reindex in batch these Assets.
Allow SMTP settings different from Sendgrid in Sidra installation
We are now suporting custom SMTP settings, allowing the use of a different SMTP server other than the default Sendgrid account. This feature was required in order to enable certain advanced scenarios for password recovery and other requirements from some of our customers and partners. There are a new set of optional parameters in the CLI to support this at installation/upgrade time. More information is detailed in the Sidra’s documentation site.
Knowledge Store ingestion performance improvements
This release includes some performance improvements regarding the Knowledge Store ingestion in Sidra. Different settings involved in the different phases of the document indexing have been tested and fine-tuned for improving the performance of the document indexing process.
These settings are resource sizing, degree of parallelism, search units, as well as infrastructure deployment settings for the associated skills executed by the Azure Search skillset.
In addition, an option in Sidra CLI deployment has been added to configure whether the Azure Search service will be deleted and re-created or not. This is a non mandatory parameter, and it is setup false by default. This setting allows the deployment to recreate the Azure Search resource with a different tier.
As well as Sidra Data Platform follows a continuous development, the documentation of Sidra keeps improving in order to ease as much as possible the experience for the user. For that reason, now it includes a section specially dedicated to the Sidra API, showing the different API’s endpoints and their schemas depending on the section of interest.
Issues fixed in Sidra 2021.R2
Access to the complete list of resolved issues and relevant changes in Sidra 2022. R2, here.
Breaking Changes in Sidra 2022.R2
Substitution of parameters on the CLI tool (Profile command)
The CLI tool contains a parameter on the command
profile that replaces three others in the
create option, making the configuration easier for the user.
No action is required. The parameters
gitRepository have been replaced by
devOpsRepositoryUrlparameter, the URL of the target Azure DevOps repository, where code and template files are to be copied. It comes in the format like https://dev.azure.com/organizationName/projectName/_git/repositoryName. This way, internally the URL will be split into three profile JSON properties. More information can be found in the
profile command page.
New parameter on the CLI tool (Deploy command)
The CLI tool now contains a new parameter on the command
RecreateAzureSearch, in the
configure option, making the configuration easier for the user.
No action is required.
RecreateAzureSearch parameter can have been setup as true from a last execution additionally to having experienced some change in the currentAzureSearchSku.
No action is required. However, please note that, when doing an upgrading, if the new setting is set to true and an sku change of the Azure Search service is required, this will trigger a full recreation of the service, and will require a full reindexing of the files. In this case, the existing index will be removed.
Because of safety purposes, we are forcing the setting of this parameter at the end of the CLI execution to false, to avoid unintended consequences. In the case that the user purposely wants to recreate Azure Search, it will have to be done manually.
This 2022.R2 release represents an important technological evolution effort to keep up to date with underlying platform improvements, namely .NET Core, Azure DevOps and Databricks.
Important steps have been taken on improving the operations of plugins and Client Application deployments.
Finally, accelerators for automatically detecting source database/tables changes have been developed, as part of the schema evolution features. Future schema evolution will incorporate changes to detect and accommodate deletions of tables/columns in source systems.
As part of the next release, we are planning to increase the plugin catalogue to create and edit Data Intake Processes from Sidra Web.
We would love to hear from you! You can make a product suggestion, report an issue, ask questions, find answers and propose new features at our Sidra ideas portal, or by reaching out to us in email@example.com.