Fork me on GitHub

n. Slang a rough lawless young Kuali developer.
[perhaps variant of Houlihan, Irish surname]
kualiganism n

Blog of an rSmart Java Developer. Full of code examples, solutions, best practices, et al.

Monday, April 26, 2010

KISmet - Game Changer: Structuring the Project to Manage Database Changes with Liquibase

Structuring your project to handle configuration management of your project RDBMS can probably be the most difficult part of managing your project. If you plan to use Liquibase to manage your database migrations, then this is even more the case. At the University of Arizona, we first followed the instructions laid out by this tutorial (Tutorial Using Oracle) since we're using Oracle.

We retain in our methodology much from that article, but rather than explain the differences, I'll just explain what we did.

The University of Arizona Liquibase Methodology in a Nutshell

The goals we wanted to solve with simplified data migration using Liquibase are:
  • Structure for isolating database changes.
  • Integrates with a process that versions a database from SVN, Jira, and Continuous Integration.
  • Coupled database version with application version.
  • Integrates with a process that facilitates rollback, update, and complete schema rebuilds.

Structure for Isolating Database Changes

A project was created at University of Arizona called kfs-cfg-dbs. Within that project is where we created the structure to manage out database migrations. We followed the example outlined in Tutorial Using Oracle. We found that we can create two paths. One path is for update (update/), and the other path is for building the latest schema entirely (latest/).

latest/

Here, we followed the convention of using 3-character paths according to the changelog content.


Pathname Content
cst constraint-related changelog
dat table-related changelog
idx index-related changelog
seq sequence-related changelog
tab table-related changelog
vw view-related changelog

constraints.xml

During our database migrations, we load schema changes and data changes. These data changes can sometimes effect constraints. For full schema rebuilds, we load constraints last to allow data loads to process faster. Therefore, we separate our constraint changelog information into its own file to run last.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<databaseChangeLog xmlns="http://www.liquibase.org/xml/ns/dbchangelog/1.9"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog/1.9 http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-1.9.xsd">
<include file="latest/cst/LD_EXP_TRNFR_DOC_T.xml" />
<include file="latest/cst/FP_PMT_MTHD_T.xml" />
<include file="latest/cst/FP_PMT_MTHD_CHART_T.xml" />
</databaseChangeLog>


Notice that entries are simply includes that point to files associated by table name in the cst/ directory within the latest/ path. For example, latest/cst/LD_EXP_TRNFR_DOC_T.xml refers to constraints on the LD_EXP_TRNFR_DOC_T table. Because cst/ is taken from latest/ we know that this file relates to new schema migrations. 100% of the time, includes in constraints.xml will point to latest. That is our convention.

data.xml

Similarly to constraints.xml, data.xml has entries to data by table-name in latest/ for new schema migrations. Here is ours:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<databaseChangeLog xmlns="http://www.liquibase.org/xml/ns/dbchangelog/1.9"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog/1.9 http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-1.9.xsd">
<include file="latest/dat/FP_PMT_MTHD_T.xml" />
<include file="latest/dat/FP_PMT_MTHD_CHART_T.xml" />
<include file="latest/dat/KREW.xml" />
<include file="latest/dat/KRIM.xml" />
<include file="latest/dat/KRIM3.xml" />
<include file="latest/dat/GL_OFFSET_DEFN_T.xml" />
<include file="latest/dat/KRIM2.xml" />
<include file="latest/dat/KRNS_PARM_T.xml" />

install.xml

Anything that needs to be migrated before data and constraints is added to the install.xml. It follows exactly the same convention as the previous two:

<include file="latest/tab/FP_PRCRMNT_LVL3_ADD_ITEM_T.xml" />
<include file="latest/tab/FP_PRCRMNT_LVL3_FUEL_T.xml" />
<include file="latest/tab/FP_PRCRMNT_CARD_HLDR_DTL_T.xml" />
<include file="latest/tab/FP_PRCRMNT_CARD_TRN_T.xml" />
<include file="latest/tab/FP_PRCRMNT_CARD_HLDR_LD_T.xml" />
<include file="latest/tab/PDP_SHIPPING_INV_TRACKING_T.xml" />
<include file="latest/seq/ERROR_CERT_ID_SEQ.xml" />
<include file="latest/seq/CM_CPTLAST_AWARD_HIST_NBR_SEQ.xml" />
<include file="latest/seq/PUR_PDF_LANG_ID.xml" />

update/

The update changelog is responsible for database migrations on existing schemas. It simply updates a schema already in use, so these are all changes.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<databaseChangeLog xmlns="http://www.liquibase.org/xml/ns/dbchangelog/1.9"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog/1.9 http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-1.9.xsd">
<include file="update/KITT-958.xml" />
</databaseChangeLog>


The update.xml refers to files in the update/ path which contains files associated by Jira Issue #. By this convention, we know that update/KITT-958.xml contains a change in this version for KITT-958. This is how we get our Jira coupling. Using this convention, we can migrate perpetually and let Jira handle linkage with our data migration/issue management.

Integrates with a process that versions a database from SVN, Jira, and Continuous Integration.

I have already shown how this works with Jira, but how do we get exclusive database versions? We use SVN to handle changelog ids for us. Changelog ids are what Liquibase uses to identify which changes are to be run. Each change gets its own id, to identify it apart from others. In order to lessen developer overhead, we simply use the $Revision$ keyword from SVN.

<databaseChangeLog xmlns="http://www.liquibase.org/xml/ns/dbchangelog/1.9"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog/1.9 http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-1.9.xsd">
<changeSet id="$Revision$" author="$Author" >
<comment>Adding a new kim type of patisserie and a new eclair baker role. Yum!!</comment>


This will put a new revision id with each checkin. We also use $Author$ to identify the person that made the change. This convention falls apart when a person commits more than one changelog at a time. Then two changelogs have the same id. This will cause problems.

Part of our methodology is the concept of one-change-per-checkin. That is, when creating a changelog for update/, we put all changes relating to KITT-958 for this release into a single changeset in our KITT-958.xml. If we have changes for another issue (KITT-959), then we put changes in KITT-959.xml. That is fine. It is crucial to understand that only one of these files gets checked in at a time though. That is, first, commit KITT-958.xml, then KITT-959.xml. The reason is that when one gets committed, it gets a revision number. Then, when the next one is committed, it gets a different revision number. This helps us keep a consistent set of changelog ids, and also prevent changelogs from stepping on each other.

Integrates with a process that facilitates rollback, update, and complete schema rebuilds.

I will explain this in more detail when I discuss testing changes to data migration. I will say though that Liquibase supports rollback of changes. Using our configuration management system, this allows us to couple data migrations with a release of KFS. We can suddenly move between versions very easily by undoing what we have done. In most cases, Liquibase can automatically rollback changes by analyzing different patterns based on the refactoring. Some cases, this is very difficult though (consult the Liquibase manual for more details on what does and doesn't auto-rollback). One example is data-related migrations. When inserting data, liquibase cannot understand how to undo an insert or an update of a record. For this, changelogs have a rollback directive to explicitly define a rollback pattern.

<databaseChangeLog xmlns="http://www.liquibase.org/xml/ns/dbchangelog/1.9"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog/1.9 http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-1.9.xsd">
<changeSet id="$Revision$" author="$Author" >
<comment>Adding a new kim type of patisserie and a new eclair baker role. Yum!!</comment>
<sql splitStatements="false" endDelimiter=""><![CDATA[
declare 
ktyp_id krim_typ_t.kim_typ_id%TYPE;
BEGIN
INSERT INTO KRIM_TYP_T 
(KIM_TYP_ID, OBJ_ID, VER_NBR, NM, SRVC_NM, ACTV_IND, NMSPC_CD) VALUES 
(KRIM_TYP_ID_S.NEXTVAL,'SYS_GUID()', 1,'Patisserie',null,'Y','KUALI')
RETURNING kim_typ_id into ktyp_id;

INSERT INTO KRIM_ROLE_T 
(ROLE_ID,OBJ_ID,VER_NBR,ROLE_NM,NMSPC_CD,DESC_TXT, KIM_TYP_ID, ACTV_IND) VALUES 
(KRIM_ROLE_ID_S.NEXTVAL,SYS_GUID(),1,'Eclair Baker','KUALI',null,ktyp_id,'Y');
END;
]]>
</sql>
<rollback>
<sql><![CDATA[
delete from KRIM_TYP_T where NM = 'Patisserie';

delete from KRIM_ROLE_T where ROLE_NM = 'Eclair Baker';
]]>
</sql>
</rollback>

Case Study: Adding a Data-Only Change

Here is a screencast on how to use liquibase to migrate database changes using the University of Arizona's methodology for change management. Includes an example on making a data related change.



Looking Ahead

Be sure to read my next blog entry which will describe how to test this change against a database using rollbacks.

2 comments:

  1. Has a nice look to it, but I'm not sure I get it. So, latest has what's needed to build a complete schema. What about the updates, do I have to move them into 'latest' when doing releases?

    Because latest always has to include all updates, do I have to duplicate updates into latest, or how is it supposed to work?

    ReplyDelete
    Replies
    1. The trouble with latest is that it is what you start with while update is your incremental update. The best thing to do with latest is automate it. Use the plugin to generate latest for you and upload it to VCS. You will hate life if you do not.

      Delete