Common Issues & Troubleshooting in Weblogic
1 – OOM issues
Young Generation : It is place where lived for short period and divided into two spaces:
Eden Space : When object created using new keyword memory allocated on this space.
Survivor Space : This is the pool which contains objects which have survived after java garbage collection from Eden space.
Old Generation : This pool is basically contain tenured and virtual (reserved) space and will be holding those objects which survived after garbage collection from Young Generation.
Tenured Space: This memory pool contains objects which survived after multiple garbage collection means object which survived after garbage collection from Survivor space.
Permanent Generation : This memory pool as name also says contain permanent class metadata and descriptors information so PermGen space always reserved for classes and those that is tied to the classes for example static members.
Java8 Update: PermGen is replaced with Metaspace which is very similar.
Main difference is that Metaspace re-sizes dynamically i.e., It can expand at runtime.
Java Metaspace space: unbounded (default)
Issues :
java.lang.OutOfMemoryError: requested 793020 bytes for Chunk::new. Out of swap space?
* java.lang.OutOfMemoryError: PermGen space
Solution: Increase the max permgen space -XX:MaxPermSize=256m
There can be a leak in the permgen objects. If tuning parameters do not resolve the issue, we need to use the memory leak detector tools and find out which instances in the permgen space are not getting cleared.
* java.lang.OutOfMemoryError: allocLargeObjectOrArray – Object size: 372032, Num elements: 372012
* java.lang.outofmemoryerror: nativeGetNewTLA
Solution: -XXtlasize:128k -XXlargeobjectlimit:128k
If this does not solve the issue, we need to check in the application code for the large objects being created and not being destroyed. Take JRA Recording (Oracle JRockit) or use JConsole and memory leak detector tools (JMAP, JHAT) for analysis on the
* java.lang.OutOfMemoryError: Java Heap Space
Solution: First thing that needs to be checked is the gc logs. Need to check whether the garbage collection is happening properly. If the heap keeps gradually increasing even after full gc, tune the gc algorithms and check if the behavior is the same. Use memory leak detector tools for both sun jdk and JRockit to check which instances from the application are not getting destroyed.
If this is not the case and the application genuinely needs more memory, increase the heap size by using the parameters:
Example: -Xms2048m -Xmx2048m
* java.lang.StackOverflowError
Solution: Stack over flow error is usually generated due to a recursive call made by the application (infinite recursion), or its because of an attempt to allocate more memory on the stack than will fit. This is usually the result of creating local array variables that are far too large for the current stack.
For the first possibility, we need to check the application code as to where is the recursive call being made.
For the second possibility, we can increase the JVM stack size by the parameter : Example: -Xss512K
=============================
java.lang.OutOfMemoryError: GC overhead limit exceeded.
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOfRange(Arrays.java:3209)
Try adding this jvm option
-XX:-UseGCOverheadLimit
2 – High CPU utilization by web logic server or any other Java process
top -H -p <PID>
-H Threads-mode operation Instructs top to display individual threads. Without this command-line option a summation of all threads in each process is shown. Later this can be changed with the `H’ interactive command.-p :Monitor-PIDs
Not the top thread id’s as per the CPU usage,
We take thread dump using kill -3 <PID> (Weblogic Server Process Id)
We can map the light weight process to Weblogic Server threads by converting thread PID to hex value example 0xa5e.
We search for the thread (nid=0xa5e) in the thread dump to know what the thread is doing.
From the stack we need to check what the thread was doing to avoid the High CPU we need to change the code to simplify the operation.
3 – JSSE issue for SHA2 Certs
- Certicom has been removed from WebLogic Server 12.1.1 and is no longer supported.
- JSSE is the only SSL implementation that is supported in WebLogic Server 12.1.1.
SHA-2 signed certificates are supported in the JSSE SSL implementation provided in WebLogic Server.
10.3.3 till 12.1.1 – you need manually change to JSSE if you using SHA2 certificates, as earlier implementation will not work with that.
Location –
4 – Jconsole for monitoring weblogic
Apart from tools like Introscope or appdynamics you can use inbuilt java monitoring tool like jconsole to monitor JVM performance.
- Set the classpath by running cmdfrom <WLS_DOMAIN_HOME>\bin
- Start exe( Alternatively you can execute it from from the default java location : C:\Program Files\Java\jdk.x.x bin\)
- Select the Weblogic Server instance from the local processlist or connect to a remote port using
5 – Managed server not coming up with error: Authorization failed even after giving correct credentials:
This errror comes because the managed server ldap server (slave) is not in sync with the Admin Server ldap server ( master) or MS ldap gets corrupted.
To overcome this issue we can use below configurations.
Refresh Replica at Startup in domain settings, so that MS uses latest LDAP replica to overcome above error, Restart MS after the change.
6 – Troubleshooting too many open files issues.
Servlet failed with IOException
java.io.FileNotFoundException:(Too many open files)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.(FileInputStream.java:106)
at weblogic.utils.classloaders.FileSource.getInputStream(FileSource.java:31)
at weblogic.servlet.internal.WarSource.getInputStream(WarSource.java:65)
at weblogic.servlet.FileServlet.sendFile(FileServlet.java:400)
File descriptors are handles used by a process to identify an open file.
Too many open files exception is thrown when a particular process runs out of File Descriptors.
To troubleshoot it we need to first check the file descriptor limits.
It is logged in the server logs
max fd’s available.
cat /proc/sys/fs/file-max
Get the list of files opened – lsof – lis of all open files belonging to processes.
lsof –p <pid> – show all files opened by a process, validate it against the limit ulimit -n
Get the count
lsof –p | wc –l
Check if the system has enough FD’s and what percentage of available FD’s are being used by the process.
Check with the developers whether those files should be opened at that time or not.
Increasing the File Descriptor limit can be a temporary work around.
Example:
vi /etc/security/limits.conf
Set httpd user soft and hard limits as follows:
httpd soft nofile 4096
httpd hard nofile 10240
You could always try doing a ulimit -n 2048. This will only reset the limit for your current shell and the number you specify must not exceed the hard limit
Each operating system has a different hard limit setup in a configuration file. For instance, the hard open file limit on Solaris can be set on boot from /etc/system.
set rlim_fd_max = 166384
set rlim_fd_cur = 8192
Under Linux, these settings are often in /etc/security/limits.conf.
There are two kinds of limits:
- softlimits are simply the currently enforced limits
- hardlimits mark the maximum value which cannot be exceeded by setting a soft limit
Soft limits could be set by any user while hard limits are changeable only by root. Limits are a property of a process.
7 – Capturing thread dumps
On UNIX/ Linux
Find the process id for your server
- Ps –ef | grep java
- Kill -3 <pid>
Thread dump goes to std out logs
- Using Admin Console
- Log into the Admin Console , click on the server
- Click onto the Server –> Monitoring –> Threads
- Click on the Dump Thread Stack
3 – Using WLST (WebLogic Scripting Tool)
- Save and execute the below snippet as ThreadDump.py
************************************
1
2 3 4 5 6 7 8 9 10 11 |
connect(‘weblogic’,’weblogic1′,’t3://localhost:7001‘)
cd(‘Servers’)
cd(‘AdminServer’)
threadDump()
disconnect()
exit() |
********************
The thread dumps get stored in the location from where you run the WLST script
4 – JSTACK java utility:
Jstack –l <pid> <filepath>
5 – Java visualVM
8 – Port conflict issues during weblogic startup.
AIX Example
$ lsof -i :50000
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
db2sysc 4128774 db2inst1 5u IPv6 0xf1000e00019f3bb8 0t0 TCP *:50000 (LISTEN)
Windows Command
- netstat -aon | findstr “<port number>”
– This shows if the specified <port number> is being used. The number in the last column is the process id (PID) of the process holding the socket. Once PID is determined, one can refer to “Windows Task Manager” to determine which application corresponds to the PID.
Windows Example
C:\>netstat -aon | findstr “50000”
TCP 0.0.0.0:50000 0.0.0.0:0 LISTENING 2564
C:\>pslist 2564
pslist v1.28 – Sysinternals PsList
Copyright ⌐ 2000-2004
Sysinternals
Process information for MACHINENAME:
Process information for MACHINENAME:
Name Pid Pri Thd Hnd Priv CPU Time Elapsed Time
db2syscs 2564 8 15 366 30912 0:00:02.859 2:12:08.564
The example above shows the use of pslist to determine the name of the process. Note that pslist is a free command available from Microsoft Sysinternals athttp://www.microsoft.com/technet/sysinternals/default.mspx .
Linux Command
- netstat -anp | grep <port number>
– This shows the PID and the program name that uses the port. The command must be run as root. - Alternatively, one can also run
fuser -n tcp <port number>
Linux Example
Suppose the port 12345 is being used by someone else. Find out who by running the following.
# netstat -anp | grep 12345
# netstat -anp | grep 12345
tcp 0 0 127.0.0.1:12345 0.0.0.0:* LISTEN 6629/ssh
tcp 0 0 ::1:12345 :::* LISTEN 6629/ssh
ssh with the PID 6629 is using the port. Find more info about it.
# ps -efl | grep 6629
4 S root 6629 29716 0 75 0 – 6976 – 14:05 pts/4 00:00:00 ssh testserver -D 12345 -l db2inst1
0 S root 7648 7302 0 78 0 – 742 pipe_w 14:07 pts/7 00:00:00 grep 6629
In this case, the user db2inst1 is deliberately using the port 12345 by specifying -D option of ssh.
Also it may be unstable but as an last resort u can use following to find out the PID using a port:
#netstat -Aan | grep <Port> |
f100050000f11bb8 tcp 0 0 *.port. LISTEN |
#rmsock f100050000f11bb8 tcpcb |
The socket 0xf100050000f11808 is being held by proccess 17170554 (java). |
#ps -ef | grep 17170554 |
9 – Slow startup of Managed Server.
- Over the time Filestore Grows big due to junk messages or messages which are not processes due to some reasons.
- Ensure Queues are 0, in MW, stop managed servers, Move Filstore to some some other location, Restart MS, new Filestore file will be created ensuring fast startup
- Big Filestore size causes slowness, since MS has to load the Filestore in Memory for processing.
10 – SSL Exchange:
11 – Running commands on multiple Servers: fanout utility :
Pre-requisite,
SSH keys should be set between source and destination servers.
Private key (id_rsa)= on source server from where you run the fanout
Public key entry in ~/.ssh/authorized_keys file on destination servers.
Step – 1 – this is to prevent server from asking yes/no for trusting the key.
vi ~/.ssh/config |
Add below lines : |
Host * |
StrictHostKeyChecking no |
2 – Download fanout from below site – it is a simple shell script which utilizes ssh –t to run multiple commands on a remote set of servers.
3 – ./fanout “server1 server2 server_n” “ sudo –u app_user .<path>/status”
Above will run ./status script on all servers and return the std out in your shell,
./fanout ” $ALL_PROD” ” sudo -u weblogic crontab -l | egrep -v ^# | awk ‘{if (\$2 == “1” || \$2==”2″ || \$2==”01″ || \$2==”02″) print \$0;}’ ” > temp_at625u
Above command will return all entries from crontab which runs on either 1 or 2 hour to check for DST changes.
You can run the same on hundreds of server at once, you can create a Env variable and define servers in the same, example :
Export PROD_Servers=Server1 server2 server3 server_n
./fanout “$PROD_Servers” “ sudo –u app_user ./status”