My network connectivity is getting better again, but it's clearly not perfect. The last two reconnects were:
Aug 25 16:42:27 nerd-gw ppp[87396]: tun0: IPCP: IPADDR[6] changing address: 0.0.0.0 --> 121.44.12.163
That's well over 24 hours, but more importantly, the addresses are located 1000 km apart:
=== grog@eureka (/dev/pts/10) ~ 138 -> host 118.209.122.167
167.122.209.118.in-addr.arpa domain name pointer ppp118-209-122-167.lns20.mel4.internode.on.net.
=== grog@eureka (/dev/pts/10) ~ 139 -> host 121.44.12.163
163.12.44.121.in-addr.arpa domain name pointer ppp121-44-12-163.lns20.syd6.internode.on.net.
It's not difficult to guess that mel is short for Melbourne, and syd is short for Sydney. There's nothing obvious about the local link to explain that: the cell hopping continues unabated, and the only real connectivity issue was a little earlier:
Aug 25 15:50:02 nerd-gw fstats: +CGREG 2
Aug 25 15:50:02 nerd-gw fstats: +CGREG 1 81E3 8FC48E8
...
Aug 25 16:39:40 nerd-gw fstats: +CGREG 1 81E3 8FC48E8
Aug 25 16:44:50 nerd-gw fstats: +CGREG 1 81E3 8FC8F2E
The ppp process reports:
Aug 25 16:41:44 nerd-gw ppp[87396]: tun0: CCP: deflink: State change Stopped --> Closed
And the statistics log reports:
1345876670 101866 60 6 8 179936739 126401433 189050 614051 # Sat Aug 25 16:37:50 2012
1345876730 101926 60 6 8 180115784 127281518 179045 880085 # Sat Aug 25 16:38:50 2012
1345876790 101986 60 6 8 180260211 128319669 144427 1038151 # Sat Aug 25 16:39:50 2012
1345876850 102046 60 6 8 180361233 129047090 101022 727421 # Sat Aug 25 16:40:50 2012
Nothing for 5 seconds!
Nothing for 5 seconds!
The important column here is the 5th, which reports RSSI: it's 8 all the way, so there's no issue with the signal strength. At this point, irritatingly, the log file stops. I'm not sure what went wrong, but probably a bug in my code. The really interesting part, though, is in /var/log/messages:
Aug 25 16:41:44 nerd-gw kernel: ugen0.2: <HUAWEI Technology> at usbus0 (disconnected)
Aug 25 16:41:44 nerd-gw kernel: u3g0: at uhub0, port 1, addr 2 (disconnected)
Aug 25 16:41:44 nerd-gw ppp[87396]: tun0: Warning: 0.0.0.0/0: Change route failed: errno: No such process
Aug 25 16:41:44 nerd-gw ppp[87396]: tun0: Warning: deflink: Unable to set physical to speed 0
Aug 25 16:41:44 nerd-gw ppp[87396]: tun0: Warning: deflink: tcsetattr: Unable to restore device settings
Aug 25 16:41:45 nerd-gw kernel: umass0: at uhub0, port 1, addr 2 (disconnected)
Aug 25 16:41:45 nerd-gw kernel: umass1: at uhub0, port 1, addr 2 (disconnected)
Aug 25 16:41:47 nerd-gw ppp[87396]: tun0: Warning: deflink: /dev/cuaU0.0: Bad file descriptor
Aug 25 16:41:50 nerd-gw kernel: ugen0.2: <HUAWEI Technology> at usbus0
Aug 25 16:41:50 nerd-gw kernel: ugen0.2: <HUAWEI Technology> at usbus0 (disconnected)
Aug 25 16:41:55 nerd-gw kernel: ugen0.2: <HUAWEI Technology> at usbus0
Aug 25 16:41:55 nerd-gw kernel: u3g0: <HUAWEI Technology HUAWEI Mobile, class 0/0, rev 2.00/0.00, addr 2> on usbus0
Aug 25 16:41:55 nerd-gw kernel: umass0: <HUAWEI Technology HUAWEI Mobile, class 0/0, rev 2.00/0.00, addr 2> on usbus0
Aug 25 16:41:56 nerd-gw fstats: ^STIN 0 0 0
Aug 25 16:41:56 nerd-gw kernel: umass1: <HUAWEI Technology HUAWEI Mobile, class 0/0, rev 2.00/0.00, addr 2> on usbus0
Aug 25 16:41:56 nerd-gw kernel: cd0 at umass-sim0 bus 0 scbus0 target 0 lun 0
Aug 25 16:41:56 nerd-gw kernel: cd0: <HUAWEI Mass Storage 2.31> Removable CD-ROM SCSI-2 device
Aug 25 16:41:56 nerd-gw kernel: cd0: 1.000MB/s transfers
Aug 25 16:41:56 nerd-gw kernel: cd0: Attempt to query device size failed: NOT READY, Medium not present
Aug 25 16:41:58 nerd-gw kernel: da0 at umass-sim1 bus 1 scbus1 target 0 lun 0
Aug 25 16:41:58 nerd-gw kernel: da0: <HUAWEI SD Storage 2.31> Removable Direct Access SCSI-2 device
Aug 25 16:41:58 nerd-gw fstats: ^SIMST 1
Aug 25 16:42:27 nerd-gw ppp[87396]: tun0: Warning: 0.0.0.0/0: Change route failed: errno: No such process
Aug 25 16:44:50 nerd-gw fstats: +CGREG 1 81E3 8FC8F2E
I've removed irrelevant entries to make the size more manageable. What is very clear here is that the device has gone away. So Internode's guess is probably correct: this horrible flaky Huawei modem is disconnecting and reconnecting. The messages from the device should give more information, but ^STIN is not described in my documentation. ^SIMST is a USIM status change (to 1: Valid USIM card state). That could be normal on initialization; it's a pity there's no documentation for the ^STIN message.
The other thing that's interesting is that, apart from this information, everything was pointing to a problem in Optus' network. I'm convinced that this continual hopping and changes in IP addresses does indicate a problem, but it's not the one that's biting here. I wonder how long we need to establish that whether the modem is defective or just typically flaky.
