Here some updates:
- The feature is still available, and I got no reply. They even reworked the interface, so to be more square
- Every night the web interface is down (for technical reasons), but the ajax call still works (I think it’s more for practical reasons)
- The interface will only tell you the phone model when it’s compatible with the femtocell, but the underlying json response will always tell the model (if it’s an SFR subscriber)
- Previously the scraping script just saved the plain json response into a file, and the extraction script collected them. But beginning with 500 000 files, the extraction took hours because of the high disk IO. Now the script parses the response and directly saves it into a SQLite3 database
- It seems you don’t even need the femtocell option (even if it’s for free). You only need an SFR account, and just have to log in. The new script can do the log in for you, if you enter the credentials. Else you must provide a valid cookie
- Since a SQL database is used now, making statistics has become a lot easier. The new statistics will do the pie chart for you (using the pie script). Both are included, and you can reuse the pie script to make nice (vector) charts for whatever you want
Here the scripts/source code: sfr_phones_db.tar
And just for the fun: new statistics (based on 3.88% of all numbers, collected over the last 2 month).
It all began when I logged on the SFR customer portal, and there my phone was shown. This phone was my own, and not the one provided with the mobile subscription. How do they know which phone I use?
Well, every time your phone registers to a cell tower, it can send its IMEI (deliberately, or upon request). This IMEI reveals the manufacturer and model of the phone (the prefixes are registered), and uniquely identifies it.
And so SFR knows which phone their subscribers are using.
This is not something new, but it was the first time I saw an operator show this info to the customer.
Now, I also have a femtocell from this provider (you can request it for free), and I am allowed to add subscribers to the group of phones allowed to connect to the femtocell. SFR has to verify if it’s a SFR customer, and additionally tells if it’s compatible with the femtocell. Only 3G can connect to the device as it provides UMTS coverage. And this information is shown when trying to add a phone number. Even the phone model of the subscriber was shown (quite a privacy issue).
Thus, I though I could put in any number, so to know if it’s an SFR number, and if the subscriber uses a 3G phone, an which one. But I don’t want to do this manually using their complicated website, when a script could do the job for me.
To find out how this feature works, first go to the specific section:
Mon Espace Client > Gérer mes lignes > Ma ligne mobile > Offre et Options > Choisir de nouvelles options
ESPACE CLIENT -> Gérer mes lignes -> Mon abonnement et mes options -> SFR FEMTO; Créer Mon groupe SFR Femto -> Modifier mes membres -> Ajouter un numéro mobile SFR
Then simply start wireshark and sniff the HTTP traffic (they do not use HTTPS). There you can see that once you entered a phone number in the form, some AJAX script performs a request.
It very simply queries the following URL, with MSISDN being the phone number.
The server returns a json response with the relevant information: is it an SFR subscriber, is it femtocell compatible, and what is the phone model.
I decided to use (abuse) this capability in order to make some statistics (with self-made pie charts):
- how many phone numbers are registered to SFR
- how many phone are 3G phones (implied from being femtocell compatible)
- what are the most used phone manufacturer and model
The (ruby) script uses wget to save the json responses, and only requires a valid cookie. The get this cookie, just log on the SFR customer portal and go to the page to add numbers to the femtocell group. You can export the cookie using Export Cookies (for Firefox).
In the beginning, the script randomly queried numbers in the range 06xxxxxxxx (100 millions possible numbers). It got 0.32% of the possible numbers between 2011-09-29 and 2011-10-05.
Here the results:
On 2012-03-02, 0.05% of possible numbers have been scanned again (randomly). The results are quite similar.
Later on, I found out phone number ranges where pre-allocated. In France, the ACERP is responsible for that (regulating telecommunications), and the list is available on their page.
Now instead of trying random numbers, it’s possible to use the known ranges. SFR has 35 prefixes in the 06xxxxxxxx range, with in total 27.8 millions possible numbers (out of 100 millions).
But there is a glitch. Since a few years, the subscribers are allowed to keep their number when switching operator. This is the so called “mobile number portability”. Thus some of the numbers with SFR prefixes are not SFR subscribers anymore, and SFR has subscribers with non SFR prefixes. This is not regulated by the ACERP and there is no public list. Only the operators themselves know which operator uses one of their numbers. 14.80% of SFR numbers do not have an SFR prefix (from the statistics from 2012-03-02), which is not really insignificant.
Out of the 27.8 millions numbers with SFR prefixes, 0.46% have been queried on 2012-03-03.
Obviously I can not provides the data collected, because of privacy reasons. Also, only the statistics have been kept (not all shown here), an the raw data deleted (I’m quite responsible, aren’t I?). But here are the scripts.
Finally, I don’t know how often the subscribers phone information are refreshed. One could find it out by trying, but I’m too lazy.
Now the boring legal thoughts.
I contacted SFR, but that was really not an easy task. On the official contact page there is no e-mail address. You can only phone the customer service. It’s quite expensive, reserved for customers, French only, and I really don’t want to talk to a hot-line trying to explain I don’t have a problem with my product, but they have a privacy issue. There also is a forum, but you have to subscribe and the post will be public (we want to warn them only). Also, I’m not a friend of dead trees, don’t want to pay a 1€ stamp for a letter, and why should I provide them my postal address.
After some time (1 hour of clicking), I randomly found (using google, not their site) an e-mail address on the impress of the corporation site: contact[at]sfr.com. Sadly it is invalid (their mail server returns a unknown recipient error), as are root/admin/webmaster/postmaster.
Finally the mail successfully went to info[at]sfr.com, but even after one week I got absolutely no reply and the feature is still available.
On one hand, the corporations always tell us to warn them if we find a bug. But when we try to do so, it’s really hard to get in touch, and most of the time, as long as the bug is not public and does not cost them too much money, they simply do not care. This is not the first time this happens to me. Actually, I never got any reply until the bugs I reported got in the mainstream specialized press.
Now another question: is this work legal? This is a hard one. On one hand, scraping/harvesting is not really forbidden. All search engine do it (as long as they respect robot.txt). It’s on their website, which is available over the whole Internet.
On the other hand, you have to login in. This function is only available for customers (a friend). But then, it’s a service provided by the operator himself. They implemented and thought about it. It’s a feature, not a bug. The operator is responsible for its data, and making user data available to others is probably breaking some privacy laws.
Now about copyright. I extracted data from their website. Considering a website as a database is risky, but this feature is somehow connected to a real database. In the copyright laws, there is a section about databases, with different right. Extracting a database is less allowed. But the script is not really connected to it, merely to a simple interface, available as service on their website. Also, I did not keep or published the data. I just ran some statistics over it and erased them afterwards.
Finally, it’s a French website, but I’m not doing this from France (luckily I don’t live there). Which law applies to me? The French or the local one?
So coming back to the question: it’s a gray zone open for debates. But getting this statistics was a nice exercise with interesting results. How else would you know how much of the allocated numbers are used? how many have 3G phones? what are the main brand? … This is data normally only operators have and would never make available. I’m not even sure if they do such statistics or if it’s just data laying around. And I think I was quite responsible when handling it.