That is, if you don't realize what happened.
I haven't found a reference as to why this was done, but after my upgrade to iPhoto 8.0.3, my rsync backup strategy decided to re-backup almost every picture I have in my library. That is until I did some poking around and found this in my iPhoto Library directory:
drwx---rwx 29 jared staff 986 Jan 11 11:25 Data.noindex
lrw-rw-rw- 1 jared staff 14 Jun 7 17:16 Data -> ./Data.noindex
Of course Jun 7 at 17:16 is about exactly when I upgraded to 8.0.3. Anyhow, this simple command from the appropriate point on my backup drive saved me a very large rsync:
$ mv Data Data.noindex
If you're using TimeMachine this isn't possible or desireable. I've seen forum posts on the net that imply that 8.0.3 will cause TimeMachine to re-backup everything, but it seems to me it should be smart enough to realize that those files just changed their path; the inode number probably didn't change.
The other day my iPhone ran into a catastrophe: it wouldn't charge, it wouldn't connect via USB, and the battery was dying. It's a first generation iPhone, and it looked like that was the end of my old iPhone.
I followed Basic iPhone troubleshooing, read various threads about similar problems, and tried a few of my own tricks:
- Tried different cable
- Tried different plug
- Tried different outlet
- Verified that those all worked with our other iPhone
- Blew gunk out of the iPhone connector area (there was actually a lot in there)
- Rebooted iPhone several times
- Tried to clean the tiny iPhone connector with rubbing alcohol
None of that worked. So I went to the Apple Store, and the Genius did the same things, and determined that it must be the battery. Actually, it was not the battery--the connection sound never happened, and the iPhone never recognized a connection. But the Apple Store genius was doing us a favor, because if the battery dies, we can buy an exact replacement (a first generation iPhone) for $85 dollars.
And by this time the battery in my iPhone was completely dead: I had failed to email my pictures off before the battery died. Another lesson learned...
I went home to consider my options, and try some other last resort actions, such as contact cleaner or disassembling the iPhone myself (or going to We Fix Macs). But I finally had an idea:
What if I tried to use a FireWire cable to charge?
It works!
I have an old JBL On Stage iPod speaker/dock that uses the FireWire pins instead of the USB pins to provide power, and that worked. It still doesn't make the "beep beep" sound when you connect the iPhone (and neither does the perfectly working iPhone). That must be reserved for a USB connection. But the iPhone immediately recognizes that it's getting power, and it charges perfectly.
Note that this trick won't work with a 3G iPhone or a second generation iPod Touch: the FireWire connection inside those have been removed.
Also, I haven't yet tried to sync with a FireWire cable. I believe it just might work, but I still have to find a FireWire cable (not the speaker dock) and try.
Today I created an Open Source project called TestCpp. It's a very simple JUnit-like C++ unit testing framework, and I'll be adding more to it in the near future.
I really wrote it because I'm working on another open source project, and I wanted to write some unit tests in C++; I quickly got frustrated with everything I had to do to get it going. With TestCpp you should be able to write your C++ unit tests and actually execute them in no time.
I just installed Ubuntu 8.10 Desktop and had a very interesting time trying to configure a static IP address. There are plenty of discussions on the forums about how this doesn't just work with the standard Network Manager. And you can't just edit /etc/network/interfaces because that is ignored when you have Network Manager installed.
To make it work, I followed this procedure:
First, remove the Network Manager packages:
sudo apt-get remove network-manager
sudo apt-get remove gnome-netstatus-applet
Now you'll have to manually set an IP so that you can connect to the Internet (modify this to be appropriate to your setup):
sudo ifconfig eth0 10.x.x.y netmask 255.255.255.0
sudo ip route add default via 10.x.x.1
sudo vi /etc/resolv.conf
Set nameserver 10.x.x.z appropriately. Next install the old gnome network admin tool:
sudo apt-get install gnome-network-admin
Finally use the old GUI to set networking configuration:
network-admin
This will store the network configuration in /etc/network/interfaces where it belongs. And it seems to work when you reboot. I'll keep it this way until Network Manager is fixed.
In my first job, the project we worked on was 100% C code. However, it was object oriented C. This was led by our colleague Chris Westin. As we were fond of pointing out, there is a difference between Object oriented languages and Object oriented programming. You can apply OOP concepts (given appropriate primitives) in any language. Here's a table where I'll record a few of these...
| Runtime polymorphism | Achievable by using function pointers, and syntactic sugar is done with clever macros. | |
|---|---|---|
| Link-time polymorphism | You probably do this already but don't define it as such; for instance, if you implement a function defined in a header differently on different platforms, you can consider this polymorphism. The function is different on Windows vs Linux. Another example might be a plugin for a browser. If you want it to run in Firefox and Opera, you might be able to get the core of it to call your own abstracted calls to the browser; the implementation thereof is determined at link-time. | |
| Abstraction | This is more a matter of design than implementation, and thus is applicable to any language. | |
| Information hiding | Again, this is a design issue. But the mechanism that is often used to achieve this is referred to as encapsulation. | |
| Encapsulation | This can be done in different ways, but usually the most effective technique is opaque types. Again, with clever data structures and macros, this can be combined with the above Runtime polymorphism to construct objects that feel like C++, or can even co-exist with C++. | |
| Exceptions | You can simulate some of this using setjmp/longjmp, but this can only go so far; the compiler doesn't know what's going on, so if you have a try/catch block that's really two macros doing housekeeping on the try/catch data structures, and then you return or break or continue out of the middle of that, there's nothing to stop you, and you've corrupted your try/catch data structures. A better method to use is to create an error data structure that can contain more information (like __FILE__ and __LINE__) than a simple int error code. This doesn't get you the magic stack unwinding, but at least it can be more informative than -1. |
Now you might be asking "why not use C++?" There are lots of answers to this, but here are a few:
- Fragile binary interface problem or Fragile base class problem
This is a real problem for deployment of C++ code, and likely an important driver early on for Microsoft to develop COM. If you ship C++ objects in a shared library, you can't do so without being extremely careful about what's exposed in the header file.
- Windows Debug Heap
Similarly, on Windows you must be careful with memory management. You can't cross allocations/deallocations across module boundaries in Windows, because that would be very bad. This can easily happen in C++ if you do allocations in the header file, and then deallocation in the implementation file (or vice-versa). You might be mixing memory heaps which will cause your app to crash.
- Incompatible behaviors across compilers or even compiler versions.
Certainly early on different compilers or even different versions of the same compiler can generate code that is incompatible in terms of things like throwing/catching exceptions, or name mangling. This might not have been a problem for some time (I haven't checked), but is indicative of C++'s lack of an ABI.
- C++ Standard library lacks ABI
Similar to the above points, if you use a certain version of compiler/C++ Standard library in your shared object, you cannot share those data types with another shared object or application that uses a different version of compiler/C++ Standard library.
- C++0x doesn't appear to address any of the ABI issues that are so well known in C++. If I'm wrong, please correct me. Bjarne's C++0x FAQ (or his C++ FAQ) doesn't even mention the word "binary"; although the word "ABI" is used, but in reference to the GC system.
- Lack of a "platform".
This is a common criticism, which of course C and other languages share. If you want to acquire a mutex, you have to do it differently depending on what platform you're on. Java and other more modern languages include ways to do this, and many many other things. C++0x and its standard library seem to address at least some of this...
- "C makes it easy to shoot yourself in the foot; C++ makes it harder, but when you do it blows your whole leg off."
Hmm... it's only harder to do in that there's more language to master.
Don't get me wrong, C++ is an extremely useful language that I use in lots of projects, but you have to know its limitations, in addition to mastering its use. I just wish that the most glaring deficiency, binary compatibility, was address in C++0x.
Just attended a web conference demonstrating a new product from Exadel called E7, hosted by Brandon Blell, Charley Cowens and Max Katz.
It looks pretty cool, although I haven't played with it yet. The idea is to present to the business rules owner/author something that is removed from Java and UI code. The Java/UI author presents different types of services, such as Web services, POJO, Page services for JSF/Flex/JavaFX, etc.
Right now it only works on Seam (as Exadel is a JBoss partner).
I also asked the question "What if you had multiple UI types in your app: JSF/Flex/JavaFX. Could you set up a 'generic' type of page service so that the process can be shared across all 3 of the UI types?" The answer is that it's in development for the next release.
Also, someone asked if there's Drools integration. The answer is not yet--they're working on it for the next release.
My setup at home is Comcast for ISP, and Vonage for phone service. I haven't noticed any severe degradation in my Comcast service recently... But then again, I probably don't use my home phone often, and perhaps I'm not online when outages occur.
However, that all changed on Thursday, and according to my neighbors this wasn't a one-time thing. On Thursday Feb 19, 2009, after 2pm and 4pm or so, my connection was pretty unusable. I couldn't use Vonage in a reasonable manner, and had real problems transferring data. This wasn't because of a complete inability to transfer data, it was dropping packets periodically. See the graph that I generated from dslreports.com about that time.
So I called 1-800-COMCAST and I hit the right buttons to report trouble with my "high speed Internet." A recorded informed me that there was trouble in my area, that technicians were working on it, and offered to call me back when it was better. It got better before 11pm that night, but I got my automated call back on Friday at 11am.
Some theories about what's happening:
Recently, about 2 or 3 weeks ago, Comcast made a change and blocked my incoming port 25. Ever since I've had Comcast I could never make an outgoing connection on port 25, which is "normal" for an ISP. But blocking incoming port 25 is deadly if you're attempting to run your own mail server.
The reason I mention the blocked port 25 is that I believe that these problems are related to Comcast's recent changes to control their bandwidth utilization. Comcast is notorious for sending forged TCP control packets to upset your P2P transfers: that's like blowing out the tires of cars on a congested highway because cars might be carrying illegal contraband. In any case, they have been rightfully remorseful and punished, at least in reputation, for this behavior.
Because of this "turnaround," they've come up with a new scheme to control their traffic. Of course, every ISP has a right to control their traffic so a single subscriber doesn't swamp all other users. But perhaps their current implementation is is still a little green... And thus our current connectivity troubles.
And you can see here we (Bay Area) were scheduled for the switchover at the end of November 2008 or so, but it probably happened more recently, or there are more changes than a simple switch, and takes time to convert all neighborhoods in the Bay Area. But apparently according to this the new system is 100% online a month and a half ago.
I just did another test, and at this time it's much better, but still not great. I should probably set up a monitoring schedule with dslreports.com; it costs a little bit of money, but no big deal.
Update
I've set up the monitoring tool. Here are my up-to-date line monitoring results:
East Coast Hourly Daily
West Coast Hourly Daily
About a year ago I patched my own log4j to fix the fact that it can swallow InterruptedException via InterruptedIOException. Then I filed a bug against log4j, and got into an email conversation with Curt Arnold, who rightly pointed out that there were other scenarios where InterruptedException could be easily ignored. Anything that wraps an InterruptedException or InterruptedIOException and rethrows something that doesn’t derive from them is effectively ignoring the intended effect of a thread interrupt. The most common examples of this are java.lang.reflect.Method.invoke and java.lang.reflect.Constructor.newInstance which both throw java.lang.reflect.InvocationTargetException, which can have this problem.
The title of this post may be a bit misleading; java.io.InterruptedIOException doesn’t cause this problem, it just makes it at least twice as difficult, because you must always check for both InterruptedException and InterruptedIOException in wrapped exceptions. But it also means that any method that throws java.io.IOException must have special handlers for an interrupted thread.
I tried to find a bug in Sun’s database that warns about InterruptedIOException and these cases, but the closest I could find was Sun bug 4385444. That doesn’t really have anything to do with it.
When I first saw the changes made for java.nio for interruptible IO, I thought the use of InterruptedIOException was clever and elegant. But because of the very special nature of InterruptedException, I changed my mind—of course, there wasn’t much option, because java.nio integrates with existing java.io interfaces and methods which already do not throw InterruptedException, therefore they had to follow that path. It’s really a difficult situation; it isn’t the first case in Java of a hidden or wrapped InterruptedException, it just makes it more widespread. Now you have to handle an interrupted thread anywhere java.io.IOException is thrown.
Note that on Solaris (x86 and Sparc) java.io methods can also throw InterruptedIOException. You don’t need to use java.nio to see this effect. On other platforms you only have to worry about this if you (or things you call) use java.nio.
I’ve come to the conclusion that thread interrupts are so special, and currently so difficult to deal with, that Java should treat InterruptedException as a third type of exception: one that is implicitly thrown from every method. Of course this opens up its own can of worms, not to mention that it’s about 15 years too late to make such a change.
This also speaks to the fact that you should be following a pattern where it’s rare that you catch exceptions that you don’t understand, and should be catching them as far up as possible, where you can centralize your exception handling. I think this is also a good case for frameworks like Spring and Seam which use AoP; each method invocation can have a carefully thought out exception handler, either via your own AoP, or directly handled in the framework.
Update I’ve found bug 4176863 which is related to this issue, but more importantly, the paper Java Thread Primitive Deprecation. I had read this long, long ago, and thought it important to link here.
I found a nice short description about why you get what you get in Xcode when you create a new source file:
and related, how you should configure your git global settings and project settings to get the right information about you:
http://github.com/guides/tell-git-your-user-name-and-email-address
I’ve run my own mail server in my house for quite a long time now, with no problems, no downtime, and it just works. Not anymore… Comcast has finally gotten around to my account to block my incoming port 25. As far as I can tell this started at midnight Thursday morning.
Several years ago they blocked my outgoing port 25, unless I used the Comcast MTA. That’s OK… so that’s what I did, reconfigured my postfix to use their MTA. But now that doesn’t even work—until I change it to use port 587.
A call to customer support gives you the expected response: “Are you using XP or Vista?” “You can’t read email in Outlook?” Of course, none of this is relevant. When the tech support person carries the appropriate information to the supervisor, the expected response is received: this is the policy for Comcast subscribers and there’s no option around it.
But there are still options… Here’s my list that I’ve been considering:
Plead with Comcast Has anyone had success with this approach?
Switch ISP There really aren’t many options here in the Bay Area. I’ve tried AT&T, other medium sized and smaller DSL’s, and they all have their disadvantages, including blocking port 25. But I am forever hopeful that someday we’ll get Fios and they’ll be good enough not to do port blocking or other evil ISP things.
Pobox.com This is the service I’ve been using for 12 years now. They forward my pobox.com email address to one that I specify. Until yesterday that was an address on a machine in my closet. Now I have it forwarded to gmail. I’ve asked them if they can forward to a port other than 25, but I haven’t gotten a response yet…
No-IP This is a little different than Pobox.com. You point your MX record at their servers and they “reflect” the email right into your server with whatever address and port you give them. This costs $40 a year… The benefit over pobox.com is that I can use this for whatever email address I like with my own domain. There are other vendors, such as AuthSMTP and DynDNS (which I use for DNS), and there’s a list that’s slightly out of date here.
GMail I can just stick with GMail and be done with it. You can find lots of discussions about using GMail, or any free email service. I just would have preferred to have some control over my own data… Update: I discovered that gmail is rewriting my outgoing email address with xxx@gmail.com (this is a problem because I want everyone to remember my “permanent” address at pobox.com which is forwarded to gmail); but, you can actually teach gmail your intended email address. I saw this tip in this lifehacker article.
We've discovered a bug in the Solaris JVM that we're using (1.5.0_08-b03). What happens is that we have a long running JVM that once a minute forks an executable via java.lang.Runtime.exec and reads its stdout. After a long time, one of the forks doesn't actually make it to the exec, and the thread that asked for the exec doesn't continue. That's why I think it's a JVM bug: the exec call is made and only completes half of its job.
I describe the problem on a Sun forum here.
I definitely have to upgrade the JVM to see if that helps...
This is the second time this has bitten me: I want to use Hibernate with standard JPA to persist my entities. And I want my Entities autodetected, as Hibernate is capable of, even outside of a JEE container. So in my persistence.xml I have this bit of code:
<property name="hibernate.archive.autodetection" value="class,hbm"/>
But, I do not compile this persistence.xml into my .jar file, instead I just make it part of my classpath for my unit tests, thinking this will make things more flexible. And of course, this doesn’t work. The autodetection only works if the persistence.xml is located in the META-INF section of the jarfile that contains the Entities to be detected. See my post in the Hibernate forum.
If you’re a JSF newbie (like me), and you’re using Seam, you might be tempted to take one of the examples and hack away at it. For instance, in the booking example, the first page is called home.xhtml. After you type up some tags, you want it to run, so you point your browser at:
http://localhost:8080/myapp/home.xhtml
What you’ll find is not a JSF rendered page, but your JSF tag source! Then you think that you’ve misconfigured something, so you look over everything. But what’s really going on is that the Seam example isn’t configured to render that page, instead you should go to:
http://localhost:8080/myapp/home.seam
That will render the home.xhtml as JSF. Look in your war file’s WEB-INF/web.xml, and find this bit of code:
<servlet-mapping>
<servlet-name>Faces Servlet</servlet-name>
<url-pattern>*.seam</url-pattern>
</servlet-mapping>
This means that the Faces Servlet logic executes against URLs that end with .seam. However, being a newbie, I’m not quite sure what maps that against the .xhtml file, unless it’s this previous entry:
<context-param>
<param-name>javax.faces.DEFAULT_SUFFIX</param-name>
<param-value>.xhtml</param-value>
</context-param>
Also being a newbie, I’m not sure what I should do to configure things so that it’s impossible for a user to download my JSF code…
Anyhow, what makes this all work by default is the file index.html, which is what will load when you visit:
http://localhost:8080/myapp/
and that contains the following:
<html>
<head>
<meta http-equiv="Refresh" content="0; URL=home.seam">
</head>
</html>
which naturally redirects you to the appropriate URL to start rendering your JSF/Seam application.
Thanks to anyone who points out the answers to the above mysteries (that are probably in the docs somewhere…)
- Download the latest (at least 1.8.0)Â commons-beanutils for your .ear. It fixes a memory leak you'd see after you redeploy your .ear.
- If you're running on OS X, disable the unnecessary dock icon from JBoss by adding -Djava.awt.headless=true. This might also solve problems on Linux/Solaris boxes that you're ssh'd onto and you don't have a DISPLAY environment variable set for X Windows.
- Use your own jboss-log4j.xml file, which by default is in ${jboss.home}/server/default/conf/jboss-log4j.xml. You probably don't need DEBUG output emitted to your console...
$ type java
java is /usr/bin/java
$ java -version
java version "1.5.0_16"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_16-b06-284)
Java HotSpot(TM) Client VM (build 1.5.0_16-133, mixed mode, sharing)
$ java -version
java version "1.6.0_07"
Java(TM) SE Runtime Environment (build 1.6.0_07-b06-153)
Java HotSpot(TM) 64-Bit Server VM (build 1.6.0_07-b06-57, mixed mode)
- It wasn't until 2004 (or was it 2006?) that you could really use Purify on Linux. Wow.
- We still can't Purify JNI C code in a Java JVM on Linux. The JVM does magic things that Purify can't handle. Oh well.
- Purify for Windows doesn't even work on x64.
That sounds innocuous, right?
When you call InetAddress.getLocalhost(), a reverse DNS lookup for your hostname is done. In the worst case, you’ve specified a DNS server that isn’t reachable, and so you have to wait for the DNS timeout, which can be quite long, like 30 seconds or 2 minutes. The reason the crypto code in JCE is doing this is for a random seed generator. Seems you could find something else more random than your hostname…
Below I’ve replicated the sample code that I created for this fix, in case it’s of any use to anyone:
I’ve found what I believe is a workaround to this problem, that seems to work against Java6. It works by setting the system property impl.prefix, and using implementations derived from the following classes:
java.net.PlainDatagramSocketImpl
java.net.Inet4AddressImpl
java.net.Inet6AddressImpl
The override implementations of Inet4AddressImpl and Inet6AddressImpl are designed to make sure that InetAddress.getLocalHost() returns an answer without causing any network access. That means that SSL connections, when constructing their random seed that includes the local hostname, will not hang when DNS cannot be reached.
The reason PlainDatagramSocketImpl is overridden is because the system property impl.prefix is also used to construct it; if impl.prefix is not specified, then a prefix of “Plain” is assumed, and thus PlainDatagramSocketImpl is loaded. Therefore we must provide an implementation that with our own matching prefix.
The main class, DefeatGetLocalHost sets the system property impl.prefix to “DefeatGetLocalHost”. This will cause the following classes to be loaded when they are needed:
java.net.DefeatGetLocalHostDatagramSocketImpl
java.net.DefeatGetLocalHostInet4AddressImpl
java.net.DefeatGetLocalHostInet6AddressImpl
The reason that these derived classes are set in the same package, java.net, is because constructors and methods are package protected; therefore placing them in the same package provides the highest level of compatibility.
Also, in order to get our derived classes in package java.net to load in the Java runtime, we have to append the boot classpath. This is done with:
-Xbootclasspath/a:
after which we specify the directory with our class files.
In the next comment are the source files that I wrote to demonstrate. Compile it and execute DefeatGetLocalHost using -Xbootclasspath/a: to include the overridden classes.
java/net/DefeatGetLocalHostDatagramSocketImpl.java:
package java.net;
class DefeatGetLocalHostDatagramSocketImpl extends PlainDatagramSocketImpl {
}
java/net/DefeatGetLocalHostInet4AddressImpl.java:
package java.net;
import java.io.IOException;
class DefeatGetLocalHostInet4AddressImpl extends Inet4AddressImpl {
public String getLocalHostName() {
System.out.println("Using implementation " +
this.getClass().getName() + ".getLocalHostName");
return "localhost";
}
public InetAddress[] lookupAllHostAddr(String hostname)
throws UnknownHostException {
System.out.println("Using implementation " +
this.getClass().getName() + ".lookupAllHostAddr");
if (hostname.equals("localhost")) {
return new InetAddress[] {
InetAddress.getByAddress(new byte[] {
(byte)127, (byte)0, (byte)0, (byte)1
})
};
}
return super.lookupAllHostAddr(hostname);
}
}
java/net/DefeatGetLocalHostInet6AddressImpl.java:
package java.net;
import java.io.IOException;
class DefeatGetLocalHostInet6AddressImpl extends Inet6AddressImpl {
public String getLocalHostName() {
System.out.println("Using implementation " +
this.getClass().getName() + ".getLocalHostName");
return "localhost";
}
public InetAddress[] lookupAllHostAddr(String hostname)
throws UnknownHostException {
System.out.println("Using implementation " +
this.getClass().getName() + ".lookupAllHostAddr");
if (hostname.equals("localhost")) {
return new InetAddress[] {
InetAddress.getByAddress(new byte[] {
(byte)127, (byte)0, (byte)0, (byte)1
})
};
}
return super.lookupAllHostAddr(hostname);
}
}
DefeatGetLocalHost.java:
public class DefeatGetLocalHost {
public static void main(String[] args) {
try {
safeMain(args);
} catch(Throwable e) {
e.printStackTrace();
}
}
private static void safeMain(String[] args)
throws java.net.UnknownHostException, java.net.SocketException {
System.setProperty("impl.prefix", "DefeatGetLocalHost");
System.out.println("Getting localhost:");
System.out.println(java.net.InetAddress.getLocalHost().getHostAddress());
System.out.println("Creating DatagramSocket:");
java.net.DatagramSocket dg = new java.net.DatagramSocket();
dg.close();
System.out.println("Success");
}
}
Read this forum entry I wrote about what could be a common NSIS coding mistake with macros. For instance, this code:
IfErrors 0 +2
!insertmacro LogProgressMessage '"There was an error..."'
will likely cause the error "Installer corrupted: invalid opcode" at runtime. Instead of using +2, you should use a label.
The code I'm working on manipulates routing tables on three different platforms: Linux, Windows and Solaris. Each of them has a different behavior for different scenarios. Here I attempt to document those differences.
First some definitions:
- Interface A device which can directly reach a subnet via ARP or other protocols. An example is
eth0. - Direct route A route which indicates which Interface to use to reach a directly connected subnet.
- Gateway route A route which indicates a gateway to use to reach a subnet which is connected via a router.
- Default route A special case of a Gateway route in which the destination subnet is all possible addresses.
- Host route A special case of any of either a Direct route or a Gateway route in which the destination is a single machine.
- Multicast route A special case of a Direct route in which the destination subnet is the multicast address space,
224.0.0.0/8or a subset thereof.
Note that example route entries in this table is based on the format emitted from Linux's /sbin/ip/route.
| Linux 2.6 | Windows 2003 | Solaris 10 | ||||
|---|---|---|---|---|---|---|
| The interface chosen to access a gateway via a route is determined by traversing the route table, not hardcoded into the route entry. For instance, default via 192.168.0.1 does not need the specification dev eth0, because that is determined by finding the direct route 192.168.0.0/24 dev eth0. | no | no | yes1 | |||
| A Gateway route which is also a Host route can be added where the destination is an address that exists in a subnet of another Direct route. For instance, the route 192.168.0.20/32 via 192.168.0.1 dev eth0 exists while the direct route 192.168.0.0/24 dev eth0 also exists. | yes2 | yes2 | yes2 | |||
| A Direct route which is also a Host route can be added where the destination is an address that exists in a subnet of another Direct route. For instance, the route 192.168.0.20/32 dev eth0 exists while the direct route 192.168.0.0/24 dev eth0 also exists. | yes | yes | yes | |||
| Direct routes can be deleted. | yes | yes3 | yes | |||
| Route priority can be programmatically controlled. | yes | yes | no4 | |||
| When an interface is administratively taken down, do the associated Direct route entries disappear? | yes | yes | yes | |||
| When an interface is administratively taken down, and associated Direct route entries disappear, do they return when the interface is brought up again? | yes | yes | yes | |||
| When an interface is administratively taken down, do the associated Gateway route entries disappear? | yes | yes | yes | |||
| When an interface is administratively taken down, and associated Gateway route entries disappear, do they return when the interface is brought up again? | no | yes | no | |||
| When an interface is administratively down, is it an error to add a route that references that interface? | yes | yes | yes | |||
| When an interface is unplugged, do the associated route entries disappear? | no | yes | no | |||
| When an interface is unplugged, and associated route entries disappear, do they return when the interface is plugged in again? | N/A | yes | N/A | |||
| When an interface is unplugged, is it an error to add a route that references that interface? | no | yes5 | no | |||
| When an interface is unplugged, and associated route entries do not disappear, will an alternate route be chosen because the interface is unplugged? | no | N/A | no | |||
| If two interfaces are connected to the same subnet, will ARP respond on either interface for an address on one of the interfaces? | yes | no | no | |||
| Can routes be modified for all attributes including priority? An answer of no means they must be destroyed and recreated to modify attributes. | no | yes | no | |||
| Does the operating system create Multicast routes by default? | no | yes | no | |||
| Can multicast routes be deleted? | yes | yes3 | yes | |||
| If Multicast routes do not exist, do multicast packets exit the machine? To where? | yes6 | no | no | |||
| Can two routes be created with the same destination and priority, but a different interface? | no | yes | yes7 | |||
| Can a Gateway route be specified with an interface, where that interface does not have a Direct route for the gateway in the Gateway route? | no8 | yes9 | no8 | |||
| When you remove a Direct route which is required by a Gateway route to reach the gateway, does the Gateway route disappear? | no | no | no | |||
| If you specify a Gateway route with a gateway that is not reachable via a Direct route, is this allowed? | no | yes10 | no | |||
1 This configuration is possible if the route entry is set for this behavior. The route entry can also be configured for a specific interface.
2 When a ping is performed on the destination address, it is sent to the gateway, not via the direct route; this is what should be expected by following the rules in reading a route table. However, the gateway in my test was a Linux 2.6 machine, and it rejected the ping with an ICMP of "unreachable." This means such a configuration is possible, but useless.
3 One cannot directly delete a default route or other "protected" routes, but there is a way to fool Windows into deleting it. I found this fascinating discussion
4 The metric attribute cannot be set for a route in Solaris.
5 The error that's returned is ERROR_INVALID_PARAMETER. That doesn't differentiate this condition from other problems.
6 It appears to choose the first available interface.
7 This question is partly irrelevant in Solaris; the priority or metric cannot be set for a route. However, you can create two routes with the same destination but different interfaces.
8 It works even if the Direct route is on another interface, but it must exist.
9 You can set the route, but it doesn't do anything.
10 This strange behavior is apparently allowed; the source address that's used is the address on the interface that is preferred for the Default gateway.
I was trying to figure out where a memory leak was coming from on Windows, and didn't have the luxury of using Purify, and this really helped:
http://blogs.msdn.com/greggm/archive/2004/02/12/72209.aspx
Essentially, VirtualAlloc is the equivalent of sbrk in other OS's, and allocates virtual pages to the process. If you can find out what's calling that all the time, you're likely to discover what's allocating memory.
Roy West was nice enough to quote me in his blog: "You don't want to ship an experiment to a customer." Generally not a good idea...
Revelations on Java Signal Handling is an important document to know how to catch signals in Java.
I suddenly seem to have all kinds of problems with the JVM crashing when I'm creating it in our monitor code. The way things work is that I have an executable that links java.so instead of using the shipped java exectuable. I call this the "driver." Here's what I've found:
The driver will often (but not most of the time) crash, only when -Xdebug is given, with the following stack trace:
gdb build/debug.linux.x86.rhel3/bin/scdriver_debug core.28224
GNU gdb 6.0
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...
Core was generated by `/home/jared.oberhaus/jared.oberhaus-linux3-all/shared/1.2/build/debug.linux.x86'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /home/jared.oberhaus/jared.oberhaus-linux3-all/tools/linux/j2sdk1.4.2_06/jre/lib/i386/libjava.so...done.
Loaded symbols for /home/jared.oberhaus/jared.oberhaus-linux3-all/tools/linux/j2sdk1.4.2_06/jre/lib/i386/libjava.so
Reading symbols from /lib/tls/libpthread.so.0...done.
Loaded symbols for /lib/tls/libpthread.so.0
Reading symbols from /lib/tls/libc.so.6...done.
Loaded symbols for /lib/tls/libc.so.6
Reading symbols from /home/jared.oberhaus/jared.oberhaus-linux3-all/tools/linux/j2sdk1.4.2_06/jre/lib/i386/server/libjvm.so...done.
Loaded symbols for /home/jared.oberhaus/jared.oberhaus-linux3-all/tools/linux/j2sdk1.4.2_06/jre/lib/i386/server/libjvm.so
Reading symbols from /home/jared.oberhaus/jared.oberhaus-linux3-all/tools/linux/j2sdk1.4.2_06/jre/lib/i386/libverify.so...done.
Loaded symbols for /home/jared.oberhaus/jared.oberhaus-linux3-all/tools/linux/j2sdk1.4.2_06/jre/lib/i386/libverify.so
Reading symbols from /lib/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/tls/libm.so.6...done.
Loaded symbols for /lib/tls/libm.so.6
Reading symbols from /home/jared.oberhaus/jared.oberhaus-linux3-all/tools/linux/j2sdk1.4.2_06/jre/lib/i386/native_threads/libhpi.so...done.
Loaded symbols for /home/jared.oberhaus/jared.oberhaus-linux3-all/tools/linux/j2sdk1.4.2_06/jre/lib/i386/native_threads/libhpi.so
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /lib/libnss_ldap.so.2...done.
Loaded symbols for /lib/libnss_ldap.so.2
Reading symbols from /lib/libresolv.so.2...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /lib/libnss_dns.so.2...done.
Loaded symbols for /lib/libnss_dns.so.2
Reading symbols from /usr/lib/sasl/libanonymous.so...done.
Loaded symbols for /usr/lib/sasl/libanonymous.so
Reading symbols from /usr/lib/sasl/libcrammd5.so...done.
Loaded symbols for /usr/lib/sasl/libcrammd5.so
Reading symbols from /usr/lib/sasl/libdigestmd5.so...done.
Loaded symbols for /usr/lib/sasl/libdigestmd5.so
Reading symbols from /usr/kerberos/lib/libdes425.so.3...done.
Loaded symbols for /usr/kerberos/lib/libdes425.so.3
Reading symbols from /usr/kerberos/lib/libkrb5.so.3...done.
Loaded symbols for /usr/kerberos/lib/libkrb5.so.3
Reading symbols from /usr/kerberos/lib/libcom_err.so.3...done.
Loaded symbols for /usr/kerberos/lib/libcom_err.so.3
Reading symbols from /usr/kerberos/lib/libk5crypto.so.3...done.
Loaded symbols for /usr/kerberos/lib/libk5crypto.so.3
Reading symbols from /usr/lib/sasl/libgssapiv2.so...done.
Loaded symbols for /usr/lib/sasl/libgssapiv2.so
Reading symbols from /usr/kerberos/lib/libgssapi_krb5.so.2...done.
Loaded symbols for /usr/kerberos/lib/libgssapi_krb5.so.2
Reading symbols from /usr/lib/sasl/liblogin.so...done.
Loaded symbols for /usr/lib/sasl/liblogin.so
Reading symbols from /lib/libcrypt.so.1...done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /lib/libpam.so.0...done.
Loaded symbols for /lib/libpam.so.0
Reading symbols from /usr/lib/sasl/libplain.so...done.
Loaded symbols for /usr/lib/sasl/libplain.so
Reading symbols from /home/jared.oberhaus/jared.oberhaus-linux3-all/tools/linux/j2sdk1.4.2_06/jre/lib/i386/libzip.so...done.
Loaded symbols for /home/jared.oberhaus/jared.oberhaus-linux3-all/tools/linux/j2sdk1.4.2_06/jre/lib/i386/libzip.so
Reading symbols from /home/jared.oberhaus/jared.oberhaus-linux3-all/tools/linux/j2sdk1.4.2_06/jre/lib/i386/libjdwp.so...done.
Loaded symbols for /home/jared.oberhaus/jared.oberhaus-linux3-all/tools/linux/j2sdk1.4.2_06/jre/lib/i386/libjdwp.so
#0 0x0066e6c1 in pthread_mutex_init () from /lib/tls/libpthread.so.0
(gdb) where
#0 0x0066e6c1 in pthread_mutex_init () from /lib/tls/libpthread.so.0
#1 0x01070e3c in ObjectMonitor::ObjectMonitor ()
from /home/jared.oberhaus/jared.oberhaus-linux3-all/tools/linux/j2sdk1.4.2_06/jre/lib/i386/server/libjvm.so
#2 0x01000517 in CreateRawMonitor ()
from /home/jared.oberhaus/jared.oberhaus-linux3-all/tools/linux/j2sdk1.4.2_06/jre/lib/i386/server/libjvm.so
#3 0x0039a872 in JVM_OnLoad ()
from /home/jared.oberhaus/jared.oberhaus-linux3-all/tools/linux/j2sdk1.4.2_06/jre/lib/i386/libjdwp.so
#4 0x00ff8a2e in JvmdiInternal::post_event ()
from /home/jared.oberhaus/jared.oberhaus-linux3-all/tools/linux/j2sdk1.4.2_06/jre/lib/i386/server/libjvm.so
#5 0x01002a0e in jvmdi::post_vm_initialized_event ()
from /home/jared.oberhaus/jared.oberhaus-linux3-all/tools/linux/j2sdk1.4.2_06/jre/lib/i386/server/libjvm.so
#6 0x010f109c in Threads::create_vm ()
from /home/jared.oberhaus/jared.oberhaus-linux3-all/tools/linux/j2sdk1.4.2_06/jre/lib/i386/server/libjvm.so
#7 0x00fb4388 in JNI_CreateJavaVM ()
from /home/jared.oberhaus/jared.oberhaus-linux3-all/tools/linux/j2sdk1.4.2_06/jre/lib/i386/server/libjvm.so
#8 0x08048e6b in exec_java (java_library_path=0x0,
jre_home=0xbfffcdda "/home/jared.oberhaus/jared.oberhaus-linux3-all/tools/linux/j2sdk1.4.2_06/jre",
java_class=0xbfffce27 "com/scalent/shared/tools/test/MonitorTest3",
classpath=0x0) at driver.c:300
#9 0x0804895d in main (argc=5, argv=0xbfffb054) at driver.c:81
- I thought it was something I did because in the stack trace I can see that
classpathandjava_library_path, parameters toexec_javaare null, and sometimes contain other bad values. Examining this with the debugger I've determined that this is just the optimizer. The compiler is passing in the right values for these when they're needed, but otherwise they reflect the value of the register$esiwhich can vary. - I tried to use Purify on this, but there is something seriously broken with Purify on the machine that I'm running on right now. It seems to work better with root, but when I try it as my own user, I get a MSE on almost every
mallocandpthreadoperation, whether my code does it or not. Another red/green/blue herring. - I tried Valgrind on it to try to find something, but that didn't seem to discover anything either. Of course, Valgrind can't really execute the whole JVM, but that's not what I was looking for; I was just trying to get it to execute my non-JVM code and find some sort of memory corruption.
- I also tried the j2sdk1.4.2_06, better than our j2sdk1.4.2_03. That didn't help at all. It still crashes at least 1/3 times.
- Finally, I went into our code and turned off all the options. After 34 runs of
com.scalent.shared.tools.test.TestMonitorit did not fail once. I believe the whole thing has something to do with the-Xdebugand related options, as I've never seen a crash in the non-debug version of the driver. - I think I really proved that it has something devious to do with
-Xdebugand friends. I commented out just the-Xdebugand-Xrunjdwp:transport=dt_socket,address=9300,server=y,suspend=noptions and ran the testcom.scalent.shared.tools.test.TestMonitor160 times and it didn't fail once. - I tried putting a 5 second delay between tests in
com.scalent.shared.tools.test.TestMonitor, but that didn't help. It still failed on the third test. - I tried again with
strict=yon the-Xrunjdwp:transportline, but that didn't help.
I also tried using thedt_shmemtransport for-Xrunjdwp, but that didn't help either.
- I have resigned myself to the fact that this is a bug in the JVM, at least with the way that I'm calling it. Fortunately it only happens while we have
-Xdebugturned on.
This should help you find the jarfile a class comes from...
Let's say you wrote some code using Java JNI and you wanted to Purify that code so that you could find memory leaks and other bugs.
Short answer: you can't.
Here's the long description about what I went through to get to there.
These are the software versions I'm working with:
RedHat Enterprise Linux 3.1
Linux kernel 2.4.21-9EL
PurifyPlus.2003a.06.13.FixPack.0155
Java Runtime 1.4.2_03-b02
One of the most important steps is the .purify file that I had constructed that suppress hundreds of thousands of warnings and allowed me to run things in a reasonable amount of time--but apparently I forgot to save that in a safe place and it's been destroyed. But easy to recreate if you follow these steps.
Anyhow, where I'm stuck is that when an attempt is made by Java to bind to a socket and start listening, it just sits there. There's no activity that I can see via strace, no CPU taken up by the process. But the rtslave is still responsive. It never goes past that step.
I can see this in two different ways; if I turn on Java debugging for my process using the appropriate flags, as soon as the JVM starts up it attempts to bind to that socket. The result is that it just hangs there before executing any Java code. However, if I turn off the Java debugging flag, much Java code is executed up to the point where my Java code attempts to bind to a socket and listen. Then it just sits there again.
In a previous exercise trying to debug Java and listening on a socket, I found that when Java opens a socket it apparently uses rtnetlink to turn off the multicast flag for that socket. I don't know if that has anything to do with it, but it might be interesting...
However, to get this far, here are the steps:
- You generally have to build a purified executable on the same machine that you're executing on. If anything is different it will crash instantly.
- The Purify
rtslaveprocess just eats tons of memory when it stores errors. If you suppress those errors, it will use much less (or no) memory for those suppressions. The reporting of those errors also takes a huge amount of time, so the purify process ran for a very long time, getting nowhere. - The JVM has lots and lots of things that look like MSE's and UMR's. Once you suppress those, the JVM can get somewhere under Purify.
- You have to set
DISPLAY, otherwise Purify will dump everything to stdout, which usually isn't very helpful. - I modified our startup environment to pass the environment variables
DISPLAY,PUREOPTIONSandPURIFYOPTIONSso that they can affect the operation of Purify. - I'm running the JVM with
-Xintso that the HotSpot compiler is not invoked, which probably would introduce lots and lots of interesting challenges to get things to work. Update: I got stuck and tried my luck with the HotSpot compiler, and now I'm getting farther. So you should not use-Xint. - I found out that IBM has a newer version of Purify that seems to work much better than the previous version against the JVM. It's PurifyPlus.2003a.06.13.FixPack.0155.
There is an undocumented parameter when building with purify, called-handle-calls-to-java. I added this to myPUREOPTIONSenvironment variable. - Because of
-handle-calls-to-java, Purify goes into its cache and sets up symbolic links to "help" the JVM find stuff. For instance, I have-cache-dirset to/var/purify/cache. In/var/purify/cache/opt/scalent/jre/lib/there are lots of symbolic links back to/opt/scalent/jre/lib/. That is where our JRE is stored in the file system. - The JVM still needs at least one more (that I know about so far) symbolic link to find stuff. First you have to run the JVM and have it fail with the message: "Error occurred during initialization of VM java.lang.UnsatisfiedLinkError: no zip on java.library.path". This is because when java looks for a library to open called "zip", on Linux it's going to look for
libzip.soon itsjava.library.path. But since the name has been Purify-mangled, it can't find it. Therefore, do the following:
cd /var/purify/cache/opt/scalent/jre/lib/i386/
ln -s /opt/scalent/jre/lib/i386/libawt.so
ln -s /opt/scalent/jre/lib/i386/libcmm.so
ln -s /opt/scalent/jre/lib/i386/libdcpr.so
ln -s /opt/scalent/jre/lib/i386/libdt_socket.so
ln -s /opt/scalent/jre/lib/i386/libfontmanager.so
ln -s /opt/scalent/jre/lib/i386/libhprof.so
ln -s /opt/scalent/jre/lib/i386/libioser12.so
ln -s /opt/scalent/jre/lib/i386/libjaas_unix.so
ln -s /opt/scalent/jre/lib/i386/libjavaplugin_jni.so
ln -s /opt/scalent/jre/lib/i386/libjawt.so
ln -s /opt/scalent/jre/lib/i386/libjcov.so
ln -s /opt/scalent/jre/lib/i386/libJdbc0dc.so
ln -s /opt/scalent/jre/lib/i386/libjdwp.so
ln -s /opt/scalent/jre/lib/i386/libjpeg.so
ln -s /opt/scalent/jre/lib/i386/libsig.so
ln -s /opt/scalent/jre/lib/i386/libjsoundalso.so
ln -s /opt/scalent/jre/lib/i386/libjsound.so
ln -s /opt/scalent/jre/lib/i386/libmlib_image.so
ln -s /opt/scalent/jre/lib/i386/libnative_chmod.so
ln -s /opt/scalent/jre/lib/i386/libnet.so
ln -s /opt/scalent/jre/lib/i386/libnio.so
ln -s /opt/scalent/jre/lib/i386/librmi.so
ln -s /opt/scalent/jre/lib/i386/libverify.so
ln -s /opt/scalent/jre/lib/i386/libzip.so
- I found another directory that needs to be linked. I got the error "ZoneInfo: /var/purify/cache/opt/scalent/jre/lib/zi/ZoneInfoMappings (No such file or directory)". I also found lots of other directories in a similar state:
cd /var/purify/cache/opt/scalent/jre/lib
ln -s /opt/scalent/jre/lib/zi
ln -s /opt/scalent/jre/lib/locale
ln -s /opt/scalent/jre/lib/images
ln -s /opt/scalent/jre/lib/im
ln -s /opt/scalent/jre/lib/fonts
ln -s /opt/scalent/jre/lib/ext
ln -s /opt/scalent/jre/lib/cmm
ln -s /opt/scalent/jre/lib/audio
- When the Java code starts up, it forks off processes that are written in C. The result is that Purify follows the fork with another Purify
rtslavethat immediately does an exec. Purify takes this as a process exit, and so immediately starts looking for leaks in that process. We don't care about leaks at this point; we'll find the leaks in the original JVM process when we want by clicking on the leak button. So until I fix process forking, I'm adding the options-inuse-at-exit=no -leaks-at-exit=noto myPURIFYOPTIONSenvironment variable.
In case you're wondering, Valgrind won't work either.
Java's memory model is very aggresive, and you have to be very careful when accessing memory from multiple threads. You of course have to synchronize access to memory locations, but you have to synchronize them even when it looks like you don't have to. There are several cases where you must use synchronize:
- To provide a mutual exclusion barrier to prevent one thread from modifying a data structure while the other is reading it.
- To provide a memory barrier to prevent memory operation reordering from doing something you didn't want to have happen.
- To make the memory you're accessing volatile so that the runtime optimizer doesn't throw away your request to read a memory location.
Here's a good web page that discusses this.
A good rule to use is that when in doubt, synchronize.
Reordering can only hit you with a multiple-cpu machine, but the problems that I've been running into recently happen on my single CPU machine, with something like this:
(Note that everything after this is speculation based on behavior I've seen):
int m_y = 0;
Thread1() {
synchronized(m_x) {
m_y = 1;
}
}
void Thread2() {
while(true)
System.out.println(m_y);
}
Even after the code in Thread1 has executed in its thread, the code in Thread2 will print 0; I believe this is because the runtime optimizer doesn't bother to look at the value of m_y after the first access. This is similar to a compile-time optimizer, which you'd fix with volatile. But a compile-time optimizer couldn't do anything in this situation.
But in Java the runtime optimizer will make it so that the first access gets the value, but it won't bother reading the value from memory anymore after that.
This strange behavior goes away by putting the synchronize(m_x) around the access to m_y. I believe this tells the runtime optimizer that something is likely to have been changed by another thread.
-Djava.security.egd=file:/dev/urandom
Today I learned something very interesting. I learned that you can't setuid on a process in Linux; not when you have multiple threads. Please see #8 in this list.
What they refer to as interesting times probably includes the following:
- When calling
setuid, only the caller thread will actually get its uid changed. All other existing threads in the "process" retain their original uid. - I believe any sane person should recognize this as meaning Linux is broken when using threads and setuid.
- This is a security hole, because root threads still exist in the process. If the non-root threads are hijacked by an attacker, they can stack stomp on the root threads and execute arbitrary code as root.
- Because synchronization depends on the ability to deliver signals, and delivering signals depends on priviledges, it's easy to see how synchronization between a thread running as root and another running as non-root can wedge the process.
- Even if I did call
setuidin the first bytecode instruction in a Java process, it's too late; Java has already forked threads to do things like garbage collection, and those threads present the security hole described above, and the synchronization problem described above. - I'm sure there's a long list of other reasons why this is bad, but I can't think of them now, and the above is sufficient.
In our project we have a Java process that uses forked processes written in C; the purpose of these forked processes is to run as root, or at least elevated privileges, while the Java process runs as some sort of nobody user. Unfortunately this doesn't work very well at all on Linux because we cannot downgrade the uid of the Java process after it starts.
This also means that if we want to listen on a port under 1024, we'll have to do that some other way; there's no way we could get the Java process to bind to that port as root and then downgrade to a nobody uid.
Also the processes I refer to have to be forked before the JVM starts. This means that we have to rendezvous with them in some manner that either means some sort of JNI code to hook up the file descriptors in the pipes, or use some other form of IPC.
Here's what we think is happening, thanks to Evan's suggestion to use gdb and Carol's assistance in recreating the loopback's IP address and route:
The first attempt by Java to open a socket is preceeded with an initialization of its socket code.
The socket initialization code calls java.net.PlainDatagramSocketImpl.leave, as is indicated in this stack trace from gdb:
#0 0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0xb75e3bf8 in connect () from /lib/tls/libpthread.so.0
#2 0xaa4b7c24 in Java_java_net_PlainDatagramSocketImpl_leave ()
from /opt/scalent/jre/lib/i386/libnet.so
#3 0xaa4b8029 in Java_java_net_PlainSocketImpl_initProto ()
from /opt/scalent/jre/lib/i386/libnet.so
#4 0xb2fa6bf2 in ?? ()
#5 0xb2fa0ddb in ?? ()
#6 0xb2f9e104 in ?? ()
#7 0xb721bb44 in JavaCalls::call_helper ()
from /opt/scalent/jre/lib/i386/client/libjvm.so
#8 0xb72cfa6d in os::os_exception_wrapper ()
from /opt/scalent/jre/lib/i386/client/libjvm.so
#9 0xb721bd96 in JavaCalls::call ()
from /opt/scalent/jre/lib/i386/client/libjvm.so
#10 0xb7200f6f in instanceKlass::call_class_initializer_impl ()
from /opt/scalent/jre/lib/i386/client/libjvm.so
#11 0xb720569c in instanceKlass::call_class_initializer ()
from /opt/scalent/jre/lib/i386/client/libjvm.so
#12 0xb72001cb in instanceKlass::initialize_impl ()
from /opt/scalent/jre/lib/i386/client/libjvm.so
#13 0xb72059af in instanceKlass::initialize ()
from /opt/scalent/jre/lib/i386/client/libjvm.so
#14 0xb720d6d4 in InterpreterRuntime::_new ()
from /opt/scalent/jre/lib/i386/client/libjvm.so
#15 0xb2fad510 in ?? ()
#16 0xb2fa0ddb in ?? ()
#17 0xb2fa0ddb in ?? ()
#18 0xb2fa0ddb in ?? ()
#19 0xb2fa0ddb in ?? ()
#20 0xb2fa0ddb in ?? ()
#21 0xb2fa0d04 in ?? ()
#22 0xb2fa0ddb in ?? ()
#23 0xb2fa10e1 in ?? ()
#24 0xb2f9e104 in ?? ()
#25 0xb721bb44 in JavaCalls::call_helper ()
from /opt/scalent/jre/lib/i386/client/libjvm.so
#26 0xb72cfa6d in os::os_exception_wrapper ()
---Type to continue, or q to quit---
from /opt/scalent/jre/lib/i386/client/libjvm.so
#27 0xb721bd96 in JavaCalls::call ()
from /opt/scalent/jre/lib/i386/client/libjvm.so
#28 0xb721b666 in JavaCalls::call_virtual ()
from /opt/scalent/jre/lib/i386/client/libjvm.so
#29 0xb721c1df in JavaCalls::call_virtual ()
from /opt/scalent/jre/lib/i386/client/libjvm.so
#30 0xb7274f25 in thread_entry ()
from /opt/scalent/jre/lib/i386/client/libjvm.so
#31 0xb7319caa in JavaThread::thread_main_inner ()
from /opt/scalent/jre/lib/i386/client/libjvm.so
#32 0xb7315674 in JavaThread::run ()
from /opt/scalent/jre/lib/i386/client/libjvm.so
#33 0xb72d1083 in _start () from /opt/scalent/jre/lib/i386/client/libjvm.so
#34 0xb75dedac in start_thread () from /lib/tls/libpthread.so.0
- The call in question seems to be to make the machine leave the multicast group. See here.
- Leaving the multicast group must involve connecting to loopback on a random port (the port it chooses changes and is always above 32768), and then shoving some random bytes through. That's my theory. Update: This is almost certainly
rtnetlink. - It just sits there trying to communicate with itself, and times out after ~3.5 minutes.
- It appears that it has its problem because the loopback device is not configured with an IP address and is not in the route table.
- After we issued the following commands, everything works just fine and the 3.5 minute delay turns into 18ms delay:
ip addr add 127.0.0.1/8 dev lo/sbin/route add -net 127.0.0.0/8 dev lo
- We found a most interesting thread on the Java forum that seems to mirror our problem. But the guy apparently never figured it out. Maybe I should post a solution there.
- We went through the following other possible problems:
- I always thought it was some kind of nfs file locking problem, but that's not the case at all.
- We redirected the logging output to a local disk on the machine. That didn't help at all.
- We saw the process was blocked reading
/dev/random;/dev/randommust use loopback to generate random numbers. To solve this we used/dev/urandomwhich is not as random, but removed the block. But the connection delay persisted. This might explain why it took 50 minutes to send the first message on the SSL connection. Once we removed the block on/dev/randomthe 50 minute message send delay seemed to go away. - We wrote some code to try to connect without SSL, but I'm not convinced that ever worked. It's still in the code and can be activated with a configuration setting, and I tested that configuration setting in my client.
- We then thought it was a delay caused by doing a reverse DNS lookup on the peer's IP address--likely so it could do certificate validation/throw nice exceptions. We saw in gdb that the stack trace was deep in some Java code that was trying to do some kind of DNS operation.
/etc/resolv.confwas empty, so we added our name server to it and rebooted the machine. That didn't help, but the stack trace changed. - Then the stack trace was stuck in
Java_java_net_PlainDatagramSocketImpl_leave; I thought that might have still been some DNS hosage, so I changed/etc/hoststo include the addresses of the peers. That didn't help and didn't change the stack trace. - Finally we typed
/sbin/ifconfig. That showed us that lo did not have an IP address. - Carol told us the correct magic commands to type.
/sbin/ip addr add 127.0.0.1/8 dev lo
/sbin/route add -net 127.0.0.0/8 dev lo
- /dev/random would block for about 3 minutes, probably because it depends on loopback to get its results.
- Java trying to do a reverse DNS lookup would block for about 3 minutes, probably because it was trying to get results from 0.0.0.0, because /etc/resolv.conf was empty, and 0.0.0.0 was being interpreted as 127.0.0.1... Update: see this post about a related fix...
I've been studying the C/C++ build and I think I learned some things:
- glibc is inextricably linked with the Linux operating system. You can't run with a new glibc.
LD_LIBRARY_PATHcan affect libc, but cannot affectld-linux.so.2(ld-2.3.2.so). It seems you can get around this withchroot, but then you have other problems.- glibc 2.3.2 has the symbol
GLIBC_PRIVATEwhich is inld-2.3.2.so, but not inld-2.2.x. - libstdc++ 3.2 (it comes with g++ 3.2) requires glibc 2.3. Redhat 7 ships with 2.2 or earlier. See previous point. You cannot take a libstdc++ from Redhat 9 and run it on Redhat 7 unless you upgrade glibc and just about everything else in the OS, at which point it's not really Redhat 7.
- libstdc++ is more than STL. It's the C++ runtime and STL. Therefore STLport can never replace libstdc++.
- I can
chrootwith Redhat 7 (actually Mandrake 8) and get my Redhat 9 compiled binary and libstdc++ 3.2 shared object. However, once I do that I can't do things like read/procor modify/etcwhich is something we need to do. - Starting with g++ 3.2, libstdc++ is attempting to be forward/backwards compatible in its ABI where possible. At this point compatibility was completely broken.
- Redhat 9 ships with compat-libstdc++ which contains the C++ runtime libraries for gcc 2.96 as used in Redhat 7.3. This means C++ stuff compiled on Redhat 7 will work on Redhat 9, but only when this package is installed.
- glibc works very well forward/backwards compatibility-wise, with the GLIBC_2.0, GLIBC_2.1, etc. symbols. If you build a binary that is C only, it's probably going to run anywhere, as long as it's glibc v2 or better, preferably glibc v2.1.
- It is impossible to statically link libstdc++ into an executable when exceptions are thrown/caught. This is because symbols such as
_Unwind_DeleteExceptionexist inlibgcc.sobut do not exist inlibgcc.a.
While setting up our development system and source control, I'm taking the philosophy that all tools are to be checked into source control, not installed on individual machines; in that way a developer's tools are never out of date. Unfortunately some tools don't like this approach, they like to hard-code or "relocate" their position during installation.
One of those is perl.
This link explains a bit how ActivateState relocates perl on install.
What happens is that the @INC path must be embedded in the perl executable on
Unix platforms, or so they claim. When install.sh is run, it calls reloc_perl,
which uses an ActiveState perl module Relocate which then uses this trick to
replace things like
/tmp/.TheInstallScriptWasNotRunTheInstallScriptWasNotRunTheInstallScriptWasNotRun-perl/lib/5.8.0
with the appropriate path. Unfortunately, when I first tried this, the path just happens to be my home directory where I downloaded it.
By the way, there is only 0x80 (128) bytes of space to put the path in, so there is a limit to what location it can be relocated into.
So, the procedure I used to get an ActivePerl that works on anyone's machine no matter where their source directory is mapped to their file system:
- Installed ActiveState Perl normally, into a place such as your home directory: in my case this was
/home/jared.oberhaus/p4/tools/linux/ActivePerl-5.8.3.809 - Found all instance of text and binary files under the installation directory that contain
/home/jared.oberhausand replaced them with the original files from the install tar. The original files still have encoded strings such as/tmp/.TheInstallScriptWasNotRunTheInstallScriptWasNotRunTheInstallScriptWasNotRun-perl/lib/5.8.0inside them. - Submitted these files to source control as-is.
- Modified ActiveState's
install.shby adding to it (not removing the original install procedures). First it links the magic/tmppath to the file location where the source control version is mapped. This is controlled by detecting where the install script exists and processing that. Whenreloc_perlexecutes it will copy everything into/home/user/p4/tools/linux/perl-5.8.3and at the same time replace the magic/tmpstring with the correct location.
You can prevent System.exit() by setting the appropriate thing in the SecurityManager. Try something like this in your JUnit test:
public void setUp() {
System.setSecurityManager(new CatchSystemExit());
}
public void tearDown() {
System.setSecurityManager(null);
}
private static class CatchSystemExit extends SecurityManager {
/** @see SecurityManager */
public void checkExit(int status) {
m_exitCode = status;
throw new SecurityException("System.exit() attempt caught");
}
/** @see SecurityManager */
public void checkPermission(Permission perm, Object context) {
}
/** @see SecurityManager */
public void checkPermission(Permission perm) {
}
}
I'm trying to submit tools and libraries into source control, and the tools (such as glibc) arrive in rpm's. Instead of installing them (which is what I don't want to do), I'm ripping the contents out. Of course, rpm doesn't give you an easy way to do that.
But I have determined the correct syntax to do it, as rpm's are really cpio files:
rpm2cpio rpmfile.rpm | cpio -id
I noticed that everytime checkstyle runs it contacts www.puppycrawl.com to get its DTD. This is annoying...
I thought it was a bug in the checkstyle code, but it turns out I put the wrong DTD identifier at the top of all the checkstyle config files. Once I fixed that, it stopped phoning home.

Recent Comments