Introduction
Data Catalog Software acts like a dictionary for all data assets in a company. In Collibra, customers store metadata about all databases, including tables and columns, spreadsheets, reports, processes, etc. The idea is to make data management easier, overall understanding, and ownership of the data assets. For example, if an employee wants to see where the data from a particular report originates, or who they should ask to make changes to a report, they can check in Collibra.
The integration with AtScale should work as follows: on sync, all data objects from an AtScale catalog/project should be transferred to Collibra.
Collibra Login: https://<your_collibra_hostname>.collibra.com/
Documentation: About Collibra
Requirements
Works with both Installer and Container versions
AtScale has to be public to communicate with Collibra.
An AtScale product license with the Data Catalog feature is required.
At least one published project
To run Collibra, download the zip file: atscale-to-collibra-integration-1.0.1.zip
How to run
1. Extract the content of the provided zip file
The zip file provided contains all the files needed to run, organized into three folders:
bin - contains executable files
config - contains the configuration
lib - contains all libraries
2. Configure
The provided application.properties The file contains the default values for all configurations, except for usernames and passwords, as well as the hosts and ports for the services used (AtScale and Collibra).
Each configuration is composed of two parts: a key (defined in the application) and a value (provided by the user), separated by =.
All the configuration values can be modified. If the key is missing, the default value will be used.
Values must be provided for at least the following configuration keys, as there are no default values for them:
collibra.username- the username used for Collibra importcollibra.password- the password for the Collibra useratscale.api.username- the username used to connect to atscaleatscale.api.password- the password for the AtScale useratscale.api.apihost- the host atscale is running onatscale.api.authhost- the host atscale authorization is running on. Usually the same asatscale.api.apihosttrigger.api.username- the username for the utility ofatscale-to-collibra-integration, set it on theapplication. Propertiesfile.trigger.api.password- the password for the utility ofatscale-to-collibra-integration, set it on theapplication. Propertiesfile.
The atscale.api.organization (the organisation for the atscale installer) It can also be changed if it is not the default.
3. Start atscale-to-collibra-integration
This can be done by executing in a terminal the command: ./bin/atscale-to-collibra-integration.
The process is not expected to be completed. It will perform synchronizations periodically (if the trigger.scheduler.cron.enabled is set to true) at the frequency specified in trigger.scheduler.cron.expression (default every 2 hours).
Leave this terminal open; here you can see the sync happening.
4. Initial metadata update
In a new terminal, with atscale-to-collibra-integration running, run ./bin/update_metadata.sh <trigger.api.username> <trigger.api.password>
The password is in the application.properties file.
This will update and/or create the necessary assets in Collibra and perform an initial synchronization.
This will create the Business Domain & Data Domain specified in the application.properties file.
After the work is completed, this terminal can be closed up safely.
5. Sync all data objects
To force it to send all objects from AtScale to Collibra (and not wait for the sync to happen on schedule), you can use the following link:
http://localhost:8081/api/sync
Paste it into your browser, and every time you refresh, it will direct you to AtScale, retrieve all the metadata, and send it to Collibra.
In Collibra, there is a Community called “Connector Testing“ please use this one to do the tests. To do this, update row 56 with the correct ID.
Currently it is: atscale.community.id=01961563-2ef8-7732-b33d-f9f667bced5f