Stelo Technical Documents

SQDR Plus Troubleshooting Tips

Last Update: 13 July 2023
Product: SQDR Plus
Version: 4.50 or later
Article ID: SQV00PU010

Abstract

This document contains generic troubleshooting tips for SQDR Plus. Most of these tips apply to all source DBMS types; please refer to the appropriate technical document for DBMS-specific issues.

Additional troubleshooting hints can be found in these technical documents

The following topics are covered:

The following topics apply to specific versions of SQDR Plus and can be resolved by updating:

Unable to login to SQDR Control Center after an upgrade of SQDR Plus

Symptom: Credentials are rejected with the error "Login request failed. The response could not be deserialized."

Solution: Refresh the webpage in the browser.

Unexpected Control Center Behavior

Symptom: After upgrading SQDR Plus, SQDR Control Center is not behaving as expected - e.g. new functionality is not working, or you get the error "Request failed. 500 the call failed on the server; see server log for details" and the jetty wrapper log C:\ProgramData\StarQuest\sqdrplus\jetty\logs\wrapper.log contains GWT (Google Widget Toolkit) serialization errors.

Solution: Refresh the Javascript cache in the browser by using control-F5.

Shortcut to Control Center opens wrong port

Symptom: If you remove SQDR Plus and reinstall specifying a different port for the Control Center service, the shortcut attempts to use the original port.

Solution: Log off and log on.

Invalid Data Conversion Error after Schema Change on the Source

Symptom: The following error was encountered after a schema change (increasing the width of a character column) was made on the source, and the agent was continuously restarting to recover from the error:

SEVERE: [sqv][Thread-122][09.09.2013 15:43:50] CaptureAgentLog: ReplicationWorker.run:
com.ibm.db2.jcc.am.SqlSyntaxErrorException: [jcc][1091][10404][3.65.77] Invalid data conversion: Parameter instance ?1?0.?00 is invalid for the requested conversion. ERRORCODE=-4461, SQLSTATE=42815
at com.ibm.db2.jcc.am.cd.a(cd.java:698)

Solution: Temporarily set validateMetadata=true in the agent configuration and restart the agent. All altered tables will be flagged for a new baseline. After the issue is resolved, set it to false (or remove the property to use the default of false) to avoid delays for future agent startups.

Configuring extended_row_sz in the Db2 LUW staging database

Symptom: The following issue was encountered when performing incremental replication between two SQL Server databases. A snapshot replication of the same table encountered no problem.

Failed to add incremental subscription at Capture Agent for source table 'dbo.mytable'. Snapshot could not be started. Error: Stored procedure SQDR.ADDSUBSCRIPTION 04.22.20140314 returned error 11. RemoteException occurred in server thread; nested exception is:

java.rmi.RemoteException: Failed to add subscription: DB2 SQL Error: SQLCODE=-670, SQLSTATE=54010, SQLERRMC=32677;;71136, DRIVER=3.66.46 SQLSTATE=54010

The error message indicates that the source table exceeds the capacity of the intermediate staging database. This was confirmed by examining the CREATE TABLE statement in the details of the subscription).

Solution: Modify the staging database configuration used by the staging agent to use "extended_row_sz" ENABLE setting. This will permit the staging database to accomodate the larger row size required by this subscription.

  1. Identify the Db2 LUW staging database being used by the staging agent by examining the controlDbUrl parameter in the SQDR Plus configuration panel. controlDbUrl is of the form jdbc:db2://localhost:50000/SQDRPnn:driverType=4;deferPrepares=false; and we are interested in the value of SQDRPnn, where nn is a one or two digit number, 0 through 99.
  2. Stop the agent
  3. Using the Db2 Command Window (Administrator) issue the following commands:

db2 connect to SQDRPnn
db2 update db cfg using extended_row_sz ENABLE
db2 connect reset

  1. Start the agent.
  2. Run the subscription and confirm that it succeeds.

Note: This tip applies only to Staging agents created with versions of SQDR Plus earlier than 4.50. The staging databases associated with agents created with SQDR Plus v4.50 and later are already configured with the "extended_row_sz" ENABLE setting.

Agent fails to start with "Port already in use" error (SQDR Plus 5.00 & earlier)

Symptom: An agent fails to start, and an error like this appears in the Diagostics (the port number will vary based on your configuration):

CaptureAgent.main:
java.rmi.server.ExportException: Port already in use: 50008; nested exception is:
java.net.BindException: Address already in use: JVM_Bind

This condition is intermittent, and the agent will often start if you reboot the system or restart the SQDR Plus Launch Agent service. It has been observed on systems with a large number of agents.

Solution: SQDR Plus is using TCP/IP ports starting at 50005 for RMI communications. This range is also part of the default dynamic port range on Windows, and another process may have already opened that port for outbound communcations before the agent in question started.

To discover which process is using the port:
C> netstat -a -b
and search the output for port in question.

Short term solution:

Identify and stop the application using the port and start the agent. The other application will then use a different port when it is restarted.

In one case, we discovered that the conflicting application was another SQDR Plus agent - netstat -a -b displayed java.exe as the application, and that the port was in use by a database connection to the second host system. After stopping the second agent, we were able to start the problem agent, and then restart the second agent.

Long term solution:

Install SQDR Plus 5.01 or later (which implements the following)

OR

Use the following command to display the dynamic port range currently in use:

C> netsh int ipv4 show dynamicport tcp

Protocol tcp Dynamic Port Range
---------------------------------
Start Port : 49152
Number of Ports : 16384

Then change the starting point of dynamic port range to higher port (above the range used by SQDR Plus and Db2):

C> netsh int ipv4 set dynamicport tcp start=51000 num=14536

This change is persistent.

Data out of range error (SQLCODE=-302, SQLSTATE=22001) for Db2 LUW Staging Table

Symptom: The following error appears, referencing the Db2 LUW staging database, and is related to particular data in a CHAR or VARCHAR column

com.ibm.db2.jcc.am.SqlDataException: DB2 SQL Error: SQLCODE=-302, SQLSTATE=22001

Solution: The error indicates data out of range (too large to fit in the corresponding column of the table).

This error may occur if a CHAR or VARCHAR column contains data that (when converted to UTF-8) is too large to fit in the destination column in the Db2 LUW staging table. For example, a SQL Server source table may include columns containing the x8D character, which converts to a 3 byte sequence when converted to UTF-8.

This is a rare condition; the default inflation of 100% (i.e. CHAR and VARCHAR columns in the Db2 LUW staging table are double the size of the corresponding columns of the source table), is usually more than sufficient. If you are affected by this issue, we recommend working with Stelo support, who will verify that this is the cause of the problem, and will either alter the Db2 LUW staging table, or temporarily change the SQDR Plus inflation property from 100 to 200, drop the subscription, wait for pruning to remove the Db2 LUW staging table (this may take up to 30 minutes), recreate the subscription, and change the inflation property back to 100.


AGENT_STACK_SZ error from Db2 LUW staging database

Symptom: When running an incremental group with a large number of tables, the following error was received:

Stored procedure SQDR.GETCHANGE3 04.91.20171002 returned error 11. com.ibm.db2.jcc.am.SqlException: DB2 SQL Error: SQLCODE=-973, SQLSTATE=57011, SQLERRMC=AGENT_STACK_SZ, DRIVER=4.19.66
Last GetChanges call: 2017-10-23 07:15:34.668000
Last Acknowledged: X'000000000000042E3FB3'
Pending Acknowledged: X'000000000000042E3FB3'

Solution: This is known issue which may occur when a large number of tables are involved in a single cycle of obtaining changes. This will be resolved in a future version of SQDR Plus.
Workaround: reduce the maximum number of transactions (an "advanced" group property) from 0 (no limit) to a number such as 100.

SqlSyntaxErrorException error (SQLCODE=-440, SQLSTATE=42884) for Db2 LUW Staging Database

Issue:

When creating a subscription to replicate a large Oracle table using the technique described Using Partitioned Snapshots to baseline a large table. the following error occurred when saving the incremental subscription:

Stored Procedure SQDR.ADDSUBSCRIPTION returned error 11
RemoteException occurred in server thread
nested exception is:
java.rmi.RemoteException: Failed to add subscription
DB2 SQL Error: SQLCODE=-440, SQLSTATE=42884, SQLERRMC=ROWID_BLOCK_NUMBER;FUNCTION, DRIVER=4.28.11

And the following errors appears in the agent diagnostics:

INFO: [sqv][RMI TCP Connection(102557)-127.0.0.1][Jun 30, 2022 2:02:13 PM]
CaptureAgentLog: CaptureAgent.addSubscription:PRODDTA.MYTABLE in ControlDB, System=MYSYS id=C9B0EE30705CFB4FBE44D727169CEC7D
com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-440, SQLSTATE=42884, SQLERRMC=ROWID_BLOCK_NUMBER;FUNCTION, DRIVER=4.28.11

CREATE VIEW SQDR.CV1655744147 AS ....
WHERE ((MOD( DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid) ,10)=00) OR UPPER(SQ47114711TYPE) IN ('S','W','A','T','X','C','R','G'))

Solution:

The failure is referring to the use of criteria containing an Oracle function DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid) - this expression is not acceptable for the Db2 staging database. Remove the criteria from the I/R subscription, or specify criteria that is valid for both the source DBMS and Db2 LUW.

SQ_JRNMAP or SQ_LOGMAP Table Not Found error (SQLCODE=-204, SQLSTATE=42704)

Symptom: The following warning appears in the agent diagnostics:

WARNING
Pruner.run:
com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-204, SQLSTATE=42704, SQLERRMC=SQDR.SQ_JRNMAP, DRIVER=4.25.1301

This message indicates that the indicated table (SQ_JRNMAP or SQ_LOGMAP) does not exist in the local Db2 LUW staging database. This table is created when you create a subscription.

Solution: Create a subscription or a publication, or disable the agent until you are ready to create a subscription.

This issue will be addressed in a future version of SQDR Plus.

Error updating SQDR.GNV or missing stored procedure e.g. SQDR.NEXTGENADD

Symptoms
After updating to a recent version of SQDR Plus, subscriptions were failing with the error that the stored procedure SQDR.NEXTGENADD was missing from the SQDRPn staging database.

Also, an authority error (SQLCODE 551/SQLSTATE 42501) related to updating the function SQDR.GNV appeared in the Diangostics log when starting the Agent for the first time after an update.

Solution
In this case, a service user (other than the default name SQDR) was being used to access the SQDRPn staging database, and that user lacked authority to update SQDR.GNV. When that failure occurred, the Agent gave up on updating stored procedure definitions and the new procedure SQDR.NEXTGENADD was not created.

Procedure

This example assumes that the error refers to schema SQDR; if the error refers to a different schema (e.g. SQDR0), adjust the SQL statements.

  • Connect to the SQDRPn database as user db2admin or another user with sufficient authority.
  • Issue the following SQL:

drop function SQDR.GNV
update sqdr.sq_properties set propertyvalue='05.24.20210826' where propertyvalue='05.24.20210827'

(adjust the WHERE clause to indicate whatever version of SQDR Plus is currently installed and currently stored in the SQ_PROPERTIES table; setting the value to something earlier tells SQDR Plus to perform update actions).

  • Restart the Agent and verify that SQDR.GNV and the missing stored procedures are created.

Using JRE 9 and later (SQDR Plus v5)

SQDR Plus 6.x & later uses OpenJDK 17. The following deprecrated information applies only to SQDR Plus v5.

We recommend using the certified version of the Java Runtime Environment (a recent version of OpenJDK 8) that is installed with SQDR Plus. The following considerations apply when using JRE 9 or later; however, this is supplied for informational purposes and is not a supported or recommended environment, as there are known issues related to sending email when using JRE 9 or later.

  • Use v4.96 or later of SQDR Plus.
  • Create a text file C:\ProgramData\StarQuest\sqdrplus\conf\wrapper-local.conf (/var/sqdrplus/conf/wrapper-local.conf on Linux)
  • For JRE 9 and 10, wrapper-local.conf should contain the following directive:

set.JDK_JAVA_OPTIONS=--add-modules=java.xml.bind

  • For JRE 11 and later, download the JAXB-RI standalone distribution from https://javaee.github.io/jaxb-v2, extract the following .jar files, and place the following directives in the wrapper-local.conf:

wrapper.java.classpath.16=C:\jaxb\jaxb-api.jar
wrapper.java.classpath.17=C:\jaxb\jaxb-runtime.jar
wrapper.java.classpath.18=C:\jaxb\istack-commons-runtime.jar
wrapper.java.classpath.19=C:\jaxb\javax.activation-api.jar

Note that the numbering may change in future versions of SQDR Plus if additional jar files are added to the CLASSPATH in the base wrapper.conf (C:\Program Files\StarQuest\sqdrplus\CapAgent\wrapper\conf\wrapper.conf).

Failure to do this will result in the following error in the CapAgent wrapper log:

Exception in thread "main" java.lang.NoClassDefFoundError: javax/xml/bind/JAXBException

 

Reclaiming Disk Space

Over time, dIsk space utilitization can grow, especially if there has been an incident that resulted in the logging a large number of errors or a backlog of change data. Here are some suggested methods of lowering disk utilization.

  • Db2 Table Space Storage (used by the local Db2 staging databases) - see Db2 LUW Staging Database - Storage Maintenance.
  • Logs in C:\ProgramData\StarQuest\sqdrplus
    • conf/wrapper.log
    • derby/logs/wrapper.log
    • jetty/logs/wrapper.log
    • derby/derby.log
  • Db2 files in C:\ProgramData\IBM\DB2\DB2COPY1\DB2 or C:\ProgramData\IBM\DB2\DB2COPY1\DB2\DIAG0000
    • Old db2diag.log files. Use db2diag -A to rotate the logs.
    • *,trap.bin, *.db2pd.bin, javacore*, Snap.*
    • FODC.* subdirectories
  • Run Windows Disk Cleanup, including system files. This is recommended after a Windows Update.
  • Installer images (downloaded zip files and expanded directories) that are no longer needed, especially large installers such as Db2 & IBM Data Studio.
  • Java dump files in C:\Program Files\StarQuest\sqdrplus\capagent.
  • Disk space can be consumed by swap space in the scenario of a large number of agents (i.e. a large number of Java processes) overcommitting memory. See Case Study: Troubleshooting Failure to start some agents (due to memory overcommitted).

StarAdmin shortcut fails to locate javaw.exe

Symptom:

After an update of SQDR Plus to 5.22 or later, StarAdmin fails with a "Searching for javaw.exe" message.

Solution:

SQDR Plus 5.22 through 5.24uses the environment variable JRESQ to locate javaw.exe. Installing the JRESQ package first, then rebooting before updating SQDR Plus should result in the correct shortcut for StarAdmin for the initial update (this is not necessary for future updates). Failing to reboot before updating SQDR Plus will result in a poorly-formed shortcut for StarAdmin. In addition, on rare occasions we have seen a problem even on a fresh install.

To fix the shortcut:

Right-click on the StarAdmin Program Group item and select Open File Location

This should open the folder:
C:\ProgramData\Microsoft\Windows\Start Menu\Programs\SQDR Plus (64-bit)\Tools\StarAdmin

Right click on StarAdmin and select Properties.

Modify the Target field to begin with:
%JRESQ%\bin\javaw.exe

Note: this is no longer a problem in SQDR Plus 5.30 and later.

Installer: opwdintg.exe file is missing

Symptom: During an installation or update of SQDR Plus, a dialog appears that opwdintg.exe is missing

Solution: This file is part of Oracle Instant Client. A particular antivirus program has decided that it is malware and will remove the file from the expanded installer image when detected. We recommend unzipping the distribution again, verifying that the file is present, and restarting the installation quickly (before the antivirus program has removed it). Alternatively, you can disable the antivirus program, add an exception for this file, or instruct the installer to ignore the error (this file is not used by SQDR Plus).

Note: this is no longer a problem in SQDR Plus 5.35 and later.

Network problems due to Black Hole Router

Symptom:

After a network infrastructure change, an agent failed to start. The agent diagnostics show that the host system was contacted, but then the connection times out.

Other methods of contacting the remote system succeed but may timeout when fetching or sending large amount of data.

The following ping command times out (or succeeds but rarely):

Windows: ping -f -l 1472 <host or IP>
or
Linux: ping -M do -s 1472 <host or IP>

The parameters instruct ping to use a packet size of 1472 (the maximum for a MTU of 1500) and do not fragment. The packet size may be a bit smaller for certain environments (e.g. IPSec VPN).

The same condition may affect communcation to a destination database or between SQDR tier 2 and tier 3.

Solution:

These symptoms are indicative of a black hole router, where communication over some routes may fail if an intermediate network segment has a maximum packet size that is smaller than the maximum packet size of the communicating hosts--and if the router does not send an appropriate Internet Control Message Protocol (ICMP) response to this condition or if a firewall on the path drops such a response.

Contact the network administrator to resolve the network misconfiguration.

Workaround: Reduce the MTU size on the local computer.

  1. Experiment to identify the maximum packet size that results in a successful ping, and add 28 to determine the MTU size.

e.g. if ping -f -l 1400 <host> succeeds, then MTU should be set to 1428.

  1. Use the netsh (Windows) or ifconfig (Linux) command to display and modify the MTU

netsh interface ipv4 show interfaces
netsh interface ipv4 show subinterfaces
netsh interface ipv4 set subinterface "Ethernet" mtu=1400

ifconfig | grep mtu
ifconfig <interface_name> mtu 1400 up

  1. Retry the failing operation (e.g. start the agent)

Note that changing the MTU size affects all communications, and should be used only as a temporary workaround.

References:

Congestion due to timezone differences

Issue:

The Agent always shows as Congested (yellow icon) even though there is no apparent congestion. This may happen if the system running SQDR Plus (Tier 2) is in a different timezone than the host system (Tier 1).

Note that the agent will appear green briefly (5 minutes) after restarting the agent.

Solution:

For a one hour timezone difference (i.e. the host system is one hour ahead of the SQDR system), edit the configuration and use the Plus sign to add these properties. The agent will restart when you click the Save icon.

congestedThreshold: 65
uncongestedThreshold: 61

Using Translate() Scalar Function to handle invalid encoding issues

A replication from Db2 for i to PostgreSQL was failing with the following error:

Insert failed at destination.
[PostgreSQL]ERROR: VERROR; invalid byle sequence for encoding 'UTF8':
0x00(Where portal 'ST7F82E8142F60' parameter $72; file mbutils, Line 1683; Routine report_invalid_encoding;)

The issue was bad data in the source table; it was discovered that the source table contained a x'00' byte in a CHAR field.

Solution:

Edit the Columns tab of the subscription.

Remove the problem column from the subscription and replace it with a derived column that converts the offending x'00' character to a blank.

TRANSLATE(MYCOL, ' ', x'00')
Column name: MYCOL (or whatever you wish to name it)
char varying (4)

Note that the syntax of the Translate() function may vary depending on the DBMS type.

SQLCODE=-670, SQLSTATE=54010 Error for PostgreSQL table with many LOB columns

Issue:

Replicating a PostgreSQL table with many (100) columns of of type “text” or “bytea” failed with this error:

Failed to add incremental subscription. Snapshot could not be started.
Stored procedure SQDR.ADDSUBSCRIPTION returned error 11
SQLCODE=-670, SQLSTATE=54010, SQLERRMC=32677_34940
DRIVER=4.31.10

Analysis:

This error refers to the Db2 LUW intermediate staging database:

SQL0670N The statement failed because the row or column size of the resulting table would have exceeded the row or column size limit.

Contact Stelo support for suggestions on resolving this situation. This is a rare condition, as it only affects certain DBMS types (PostgreSQL) and the table was an artificial test table; it is unlikely that a real world table will have the same characteristics.

Unable to open wrapper.conf (Linux/AIX)

Symptom: After installing or updating SQDR Plus on Linux or AIX, messages similar to the following appear in the terminal console when trying to start the SQDR Plus services:

FATAL | wrapper | Unable to open configuration file. /opt/StarQuest/sqdrplus/capagent/wrapper/conf/wrapper.conf

Solution: This error can result if the user performing the installation or update (e.g. root) is using a umask with restrictive file permissions - e.g. a umask of 027 will prevent the sqdr user (running the service) from accessing the files it needs to read. We recommend setting umask 022 to ensure that the SQDR Plus program files are readable by the sqdr user.

Note: this is no longer a problem in SQDR Plus 5.24 and later.



DISCLAIMER

The information in technical documents comes without any warranty or applicability for a specific purpose. The author(s) or distributor(s) will not accept responsibility for any damage incurred directly or indirectly through use of the information contained in these documents. The instructions may need to be modified to be appropriate for the hardware and software that has been installed and configured within a particular organization.  The information in technical documents should be considered only as an example and may include information from various sources, including IBM, Microsoft, and other organizations.