Sidra Data Platform | Version 2022.R2: Kind Kanzi

The main focus of this release has been the full migration of .NET Core 3.2 to .Net 6 in Sidra.

However, this release also includes important functional improvements. The most significant of them is the support for schema evolution in source systems of type database. While previous versions of Sidra already detected changes to the data source table schemas and notified the user of these changes, this release brings the support for fully automated schema amendment on the DSU table, adding new Attributes and Entities when they appear in the data source.

Knowledge Store ingestion performance and operational improvements have also been included here to accomplish different scenarios derived from the binary file ingestion, bringing the knowledge store to a fully supported scenario now.

Besides all these changes, some other technical evolution topics have been included in this release, including the update of Databricks cluster runtime to the latest long term support (LTS) version 10.4, and the migration of all the CI/CD Azure DevOps release pipelines to support the current windows-2022 agent.

Sidra also continues to consolidate the operational improvements around Client Applications deployment and plugin management.

What’s new in Sidra 2022.R2

.NET migration

One of the biggest changes included in this release has been the full migration of .NET version from 3.2 to .NET version 6. This was a critical and much needed effort, since Sidra was using .Net Core 3.2, which was going to go out EOL by the end of 2022. By moving to .Net 6 we are not just reaping the benefits of the newer platform, but with this new version being LTS, we are ensuring supportability until November 2024. This migration work has impacted all Core, DSU and Client Apps modules, including:

Sidra plugins
Sidra API
Sidra Client Applications
Identity Server
Balea authorization framework
Other Sidra backend modules, e.g backend jobs, backoffice and deployment tools and modules

Windows-2022 release pipelines

The Agent Pool in Sidra Azure DevOps Release pipeline has been upgraded to windows-2022, affecting both Core deployment and Client Apps pipelines. You can check more information about this technology evolution in this link.

Schema evolution for added columns and tables in source

In this release Sidra incorporates a new feature intended to automatically evolve the schema of the source tables, meaning that, whenever new columns of data are added, new Attributes will be created in Sidra Core metadata for each new column added recently.

Following the Sidra process, new columns will be created in their respective Databricks tables for that Entity. Then, once in each execution of the data extraction pipeline, Assets will be created including these new columns in Databricks. Also, whenever new tables are added in the source system, if these tables are configured to be included in the data extraction (in the metadata extraction options of the Data Intake Process configuration), new Entities will be created for these added tables and associated to the data extraction pipeline, so they can be included in the data extraction set.

For more details, you can see the related schema evolution documentation.

Updated Databricks cluster configuration to 10.4 LTS

The runtime version of Databricks cluster for DSU resource groups has been upgraded to the latest long term stable (LTS) version that Databrick has released. You can check this information in Azure Databricks documentation.

Support multiple Client Application pipelines per Entity

A more robust mechanism to manage multiple Client Application pipelines per Entity has been included.

This has been done through a couple of improvements on the Client Applications side:

Sidra now supports a new PipelineSyncBehavior (Sync webjob that synchronizes metadata between the DSU and the Client Application). This behaviour loads any Asset before up the most recent day in which there are available Assets. The Asset status is tracked via the table ExtractPipelineExecution, to work with and track status of multiple pipelines per Entity. This is the mandatory Sync Behavior (PipelineSyncBehavior) for loading an Entity with several pipelines. This is also the recommended SyncBehavior for all new load pipelines.
All the Client Application pipeline templates have been updated to keep track and update this Client pipeline execution status. You can check this information in Sidra Client Application concepts and Sidra Client Application pipelines documentation pages.

Client Application improvements

This feature consists of a few enhancements on the Client Application side:

A condition to the orchestrator activity in the Client Application load pipelines has been introduced, to manage when the stored procedure is empty, or creates a new template without the orchestrator.
All sidra-owned Client Application pipeline templates have been reviewed from the security standpoint, to ensure all sensitive settings are handled by the KeyVault resource inside the Client Application.
A new Consolidation mode overwrite for overwriting data for Client Applications has been added. To enable this mode, a new Client Application pipeline parameter to signal this mode has been added, called PipelineExecutionProperties. In the default consolidation mode merge, the data will be merged if there is a Primary Key, otherwise data will be appended. If the consolidation mode is overwrite, the entire table will be always overwritten.

More information about this feature can be checked in the documentation site.

Inclusion of Entity information in Client Apps

For this new version of Sidra, information about Client Applications from the Asset table in Sidra Core has been included such as row count (Entities), errors (Validation Errors) and byte sizes (ByteSize), in order to keep a detailed tracking of the whole process from the Client Application side. The addition of these controlled fields allows a configuration advanced logic for use cases on the Client Application side.

Plugin release management improvements

We have refactored the plugins code to move each plugin to its own release branch independent of Sidra release and other plugins release. Additionally, we have added a new registration endpoint for Sidra plugins, to ease the compatibility set up between each new plugin version and the Sidra version.

File indexing artifacts deletion and re-indexing automated process

This feature complements the current capability of Knowledge Store ingestion in Sidra, to support scenarios where we need a semi-automated process to remove artifacts from the index, or prepare the environment for a re-indexing of the documents, avoiding the previously ingested documents from affecting the index. This scenario could happen whenever there is a change in the index structure, an update of the skill model, etc.

A new Core API endpoint has been developed, that orchestrates the different actions required to prepare all the moving pieces of Sidra binary file ingestion to remove old documents from documents index, and intermediate and final data ingestion artifacts (raw storage, index, Knowledge Store, Databricks tables).

This endpoint supports two basic modes. In both modes, the intermediate structures and Asset state for the given Assets ingested by a specific pipeline are removed from Core. The set of these artifacts that are always removed are the following:

Indexes
Indexers
Azure Search datasources
Knowledge Store files
Databricks tables and ingested data

With regards to what to do with the raw documents in the intermediate raw storage container, two different modes have been defined for this endpoint:

Mode Delete: Moves all Asset binary files from all Entities associated to the Azure Search pipeline to the backup container, and set the Asset indexing state to Archived.
Mode PrepareToReindex: Moves all Asset binary files from the raw container of each of the Entities associated with the pipeline to the /indexlanding container. In this case, in the next execution of the indexer job, it will re-register the Assets and reindex in batch these Assets.

Allow SMTP settings different from Sendgrid in Sidra installation

We are now suporting custom SMTP settings, allowing the use of a different SMTP server other than the default Sendgrid account. This feature was required in order to enable certain advanced scenarios for password recovery and other requirements from some of our customers and partners. There are a new set of optional parameters in the CLI to support this at installation/upgrade time. More information is detailed in the Sidra’s documentation site.

Knowledge Store ingestion performance improvements

This release includes some performance improvements regarding the Knowledge Store ingestion in Sidra. Different settings involved in the different phases of the document indexing have been tested and fine-tuned for improving the performance of the document indexing process.

These settings are resource sizing, degree of parallelism, search units, as well as infrastructure deployment settings for the associated skills executed by the Azure Search skillset.

In addition, an option in Sidra CLI deployment has been added to configure whether the Azure Search service will be deleted and re-created or not. This is a non mandatory parameter, and it is setup false by default. This setting allows the deployment to recreate the Azure Search resource with a different tier.

Documentation improvements

As well as Sidra Data Platform follows a continuous development, the documentation of Sidra keeps improving in order to ease as much as possible the experience for the user. For that reason, now it includes a section specially dedicated to the Sidra API, showing the different API’s endpoints and their schemas depending on the section of interest.

Issues fixed in Sidra 2021.R2

Access to the complete list of resolved issues and relevant changes in Sidra 2022. R2, here.

Breaking Changes in Sidra 2022.R2

Substitution of parameters on the CLI tool (Profile command)

Description

The CLI tool contains a parameter on the command profile that replaces three others in the create option, making the configuration easier for the user.

Required Action

No action is required. The parameters devOpsProjectUrl, devOpsProject and gitRepository have been replaced by devOpsRepositoryUrlparameter, the URL of the target Azure DevOps repository, where code and template files are to be copied. It comes in the format like https://dev.azure.com/organizationName/projectName/_git/repositoryName. This way, internally the URL will be split into three profile JSON properties. More information can be found in the profile command page.

New parameter on the CLI tool (Deploy command)

Description

The CLI tool now contains a new parameter on the command deploy, RecreateAzureSearch, in the configure option, making the configuration easier for the user.

Required Action

No action is required.

RecreateAzureSearch parameter

Description

RecreateAzureSearch parameter can have been setup as true from a last execution additionally to having experienced some change in the currentAzureSearchSku.

Required Action

No action is required. However, please note that, when doing an upgrading, if the new setting is set to true and an sku change of the Azure Search service is required, this will trigger a full recreation of the service, and will require a full reindexing of the files. In this case, the existing index will be removed.

Because of safety purposes, we are forcing the setting of this parameter at the end of the CLI execution to false, to avoid unintended consequences. In the case that the user purposely wants to recreate Azure Search, it will have to be done manually.

Coming soon…

This 2022.R2 release represents an important technological evolution effort to keep up to date with underlying platform improvements, namely .NET Core, Azure DevOps and Databricks.

Important steps have been taken on improving the operations of plugins and Client Application deployments.

Finally, accelerators for automatically detecting source database/tables changes have been developed, as part of the schema evolution features. Future schema evolution will incorporate changes to detect and accommodate deletions of tables/columns in source systems.

As part of the next release, we are planning to increase the plugin catalogue to create and edit Data Intake Processes from Sidra Web.

Feedback

We would love to hear from you! You can make a product suggestion, report an issue, ask questions, find answers and propose new features at our Sidra ideas portal, or by reaching out to us in info@sidra.dev.

Cookie	Duration	Description
__cfduid	1 year	The cookie is used by cdn services like CloudFare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information.
__cfduid	29 days 23 hours 59 minutes	The cookie is used by cdn services like CloudFare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information.
__cfduid	1 year	The cookie is used by cdn services like CloudFare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information.
__cfduid	29 days 23 hours 59 minutes	The cookie is used by cdn services like CloudFare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information.
_ga	1 year	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_ga	1 year	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_ga	1 year	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_ga	1 year	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_UA-326213-2	1 year	No description
_gat_UA-326213-2	1 year	No description
_gat_UA-326213-2	1 year	No description
_gat_UA-326213-2	1 year	No description
_gid	1 year	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form.
_gid	1 year	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form.
_gid	1 year	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form.
_gid	1 year	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form.
attributionCookie	session	No description
cookielawinfo-checkbox-analytics	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Analytics" category .
cookielawinfo-checkbox-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-non-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary".
cookielawinfo-checkbox-non-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary".
cookielawinfo-checkbox-non-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary".
cookielawinfo-checkbox-non-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary".
cookielawinfo-checkbox-performance	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to store the user consent for cookies in the category "Performance".
cppro-ft	1 year	No description
cppro-ft	7 years 1 months 12 days 23 hours 59 minutes	No description
cppro-ft	7 years 1 months 12 days 23 hours 59 minutes	No description
cppro-ft	1 year	No description
cppro-ft-style	1 year	No description
cppro-ft-style	1 year	No description
cppro-ft-style	session	No description
cppro-ft-style	session	No description
cppro-ft-style-temp	23 hours 59 minutes	No description
cppro-ft-style-temp	23 hours 59 minutes	No description
cppro-ft-style-temp	23 hours 59 minutes	No description
cppro-ft-style-temp	1 year	No description
i18n	10 years	No description available.
IE-jwt	62 years 6 months 9 days 9 hours	No description
IE-LANG_CODE	62 years 6 months 9 days 9 hours	No description
IE-set_country	62 years 6 months 9 days 9 hours	No description
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
viewed_cookie_policy	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
viewed_cookie_policy	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
wmc	9 years 11 months 30 days 11 hours 59 minutes	No description

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
sp_landing	1 day	The sp_landing is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
sp_t	1 year	The sp_t cookie is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.

Cookie	Duration	Description
_hjAbsoluteSessionInProgress	1 year	No description
_hjAbsoluteSessionInProgress	1 year	No description
_hjAbsoluteSessionInProgress	1 year	No description
_hjAbsoluteSessionInProgress	1 year	No description
_hjFirstSeen	29 minutes	No description
_hjFirstSeen	29 minutes	No description
_hjFirstSeen	29 minutes	No description
_hjFirstSeen	1 year	No description
_hjid	11 months 29 days 23 hours 59 minutes	This cookie is set by Hotjar. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_hjid	11 months 29 days 23 hours 59 minutes	This cookie is set by Hotjar. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_hjid	1 year	This cookie is set by Hotjar. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_hjid	1 year	This cookie is set by Hotjar. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_hjIncludedInPageviewSample	1 year	No description
_hjIncludedInPageviewSample	1 year	No description
_hjIncludedInPageviewSample	1 year	No description
_hjIncludedInPageviewSample	1 year	No description
_hjSession_1776154	session	No description
_hjSessionUser_1776154	session	No description
_hjTLDTest	1 year	No description
_hjTLDTest	1 year	No description
_hjTLDTest	session	No description
_hjTLDTest	session	No description
_lfa_test_cookie_stored	past	No description

Cookie	Duration	Description
loglevel	never	No description available.
prism_90878714	1 month	No description
redirectFacebook	2 minutes	No description
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.