Thursday, July 22, 2010

Remove duplicates from an input XML using XSLT

It is often the case that an input XML may contain duplicate data. It might be necessary to filter the duplicate data using a unique identifier, also from the input and send only the non-repeating, unique data to the output. In these cases, this transformation can be used to filter the data in the input.
This can be better understood by mapping this case to a real time scenario. Let us assume an organization having employees who can work in more than one department. If the input is going to contain the list of employees based on the department, there will be some employees whose data can repeat, as they are part of more than one department. If we need to filter the result based on the employee id to get a set of non-repeating unique employees, this logic can be used.

The implementation of this logic has been done using the function “following::”. The transformation logic is as below:


<?xml version="1.0" encoding="UTF-8" ?>
<?oracle-xsl-mapper
  <!-- SPECIFICATION OF MAP SOURCES AND TARGETS, DO NOT MODIFY. -->
  <mapSources>
    <source type="XSD">
      <schema location="http://localhost:7778/Schemas/Sample.xsd"/>
      <rootElement name="SampleXML" namespace="http://xmlns.oracle.com/
SampleXML"/>
    </source>
  </mapSources>
  <mapTargets>
    <target type="XSD">
      <schema location="
http://localhost:7778/Schemas/Sample.xsd"/>
      <rootElement name="
SampleXML" namespace="http://xmlns.oracle.com/SampleXML"/>
    </target>
  </mapTargets>
  <!-- GENERATED BY ORACLE XSL MAPPER 10.1.3.4.0(build 080718.0645) AT [THU SEP 24 15:16:11 EEST 2009]. -->
?>
<xsl:stylesheet version="1.0"
                xmlns:bpws="http://schemas.xmlsoap.org/ws/2003/03/business-process/"
                xmlns:xsd="http://www.w3.org/2001/XMLSchema"
                xmlns:hwf="http://xmlns.oracle.com/bpel/workflow/xpath"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                exclude-result-prefixes="xsl xsd bpws hwf">
    <xsl:template match="/">
        <EmployeeListSorted>
            <xsl:for-each select="/EmployeeList/Employee[not(EmpId=following::EmpId)]">
                <xsl:sort select="./EmpId" order="ascending"/>
                <Employee>
                    <xsl:copy-of select="./Name"/>
                    <xsl:copy-of select="./EmpId"/>
                </Employee>
            </xsl:for-each>
        </EmployeeListSorted>
    </xsl:template>   
</xsl:stylesheet>



The preceding-sibling grouping technique did not work as because your nodes are not siblings of each other and because it only works where the grouping key is the string-value of the node, not where it is some other function of the node (here, its name).

Peace !

Cheers,
- AR

2 comments:

Baji said...

if i want to have a condition on empid and deptid.how i need to write it.

Arun Ramesh said...

If you need to filter using both empid and deptid, you have to have an additional field in input xml where the empid and deptid would be concatenated (like empid:deptid). You can then use the following clause on that field to filter using both empid and deptid.

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 Unported License