Category: Older Essays

Earlier writing from 2010–2015 on tech policy, open culture, education, and civil society

Thoughts on Unicode in India
I have been thinking about multilingual publishing on the Web and in other digital forms, specifically in Indic languages, in the context of my work at Pratham Books and around Wikipedia and the more I think about it, the more I believe it to be crucial in enabling the preservation of language and culture as also in being able publish and to spread knowledge and culture.

These are early thoughts that I will refine over the next few months in to a white paper and would welcome feedback on this.
1. While there are many ways to achieve a legal framework for inter-operable content (CC, GFDL, PD or the Copyright Act Amendment for the Print Impaired) etc. there needs to be a technical framework for such interoperability as well.
2. Given that we (Pratham Books) publish in Indian languages, using Unicode fonts are the only way to achieve cross-platform interoperability and is a global standard.
3. Given India’s push towards copyright reform for the print impaired, it is imperative that Unicode fonts be used in the creation of Indic content because it is otherwise a huge barrier to conversion to print-friendly formats.
4. Unicode, being an open global standard guarantees content accessibility in the future and ensures no proprietary font and vendor lock in.
5. The limitation is on the lack of high quality and varied typefaces that are both screen and print optimised Open Type Indic Unicode fonts.
6. Given the importance of linguistic diversity to India’s cultural heritage, it is imperative that greater attention is paid to the development of such fonts under licenses that allow for free re-use and to fix issues in the fonts that might arise.
7. The Govt. should fund the open development of at least 5 such fonts for each the 21 Constitutionally recognised languages and make these available not just for free, but under free license to re-use and improve as well.
8. The GoI has recognised this and notified Unicode 5.1.0 as the de-facto standard for all eGovernance projects. This standard needs to be more widely adopted for all Government digital projects and any software or content procurement as well.
9. Use of Unicode will significantly reduce bandwidth/storage as they are more efficient, allows for universal search (within a page/web search etc.), sorting and indexing, for text-to-voice synthesis, for machine translation and allows for greater and better search engine optimisation.
I have to add that I am not an expert on this field and may have things down completely incorrectly – I really do need your help in working on this, please.

Update on 22.02.2011

Thanks to Santhosh over at the Wikimedia India mailing list, I have learnt much. In particular that Unicode isn’t a font as such but a method of encoding information.

From what I understand – there are three components:
1. Input (Different types of keyboard layouts are used but are independent of the method of encoding.)
2. Encoding and storing the input (ASCII is the older method. Unicode is the standard.)
3. Representing, visually for the human user, what has been inputed and encoded. (Font or typefaces and these are, to an extent, independent of the encoding method used.)
An excellent resource of “List of available Indic fonts for scripts encoded in Unicode.” http://indlinux.org/wiki/index.php/IndicFontsList

Update on 17.03.2011

To add: http://generalprop.newschallenge.org/SNC/ViewItem.aspx?pguid=900d111c-1475-4d78-8fab-789663818724&itemguid=696ce9df-02c5-4585-8062-a96e57ef50f7
February 21, 2011
Thoughts on Wikipedia in India

Wikipedia turned ten on the 15th of January 2011 and its history is both well known and fairly well documented. Globally, Wikipedia is the fifth most popular website with the English Wikipedia being the most popular destination. What is far more interesting to note, however, is that close to 98% of the traffic from India was to the English language Wikipedia with the remainder travelling to an Indic language Wikipedia which raises a question of interest – what is the history of Wikipedia in India?

Wikipedia is popular in India – current data shows that it is the seventh most popular site in the country and comes out ahead of many popular sites including Twitter and Orkut. While it is well nigh impossible to pinpoint the first edit or the first person who read or edited Wikipedia in India, it is possible to use proxies for this investigation.

The article on India on English Wikipedia was first created on 26th of October 2001 and languished for many years – between 2001 and 2003 it saw only 199 edits. 2004 saw 1700 edits to the page, 2005 had 2311 edits and contributions peaked in 2006 with 6752 edits. From 2007, the number of edits have steadily dropped and the period from 2007 to 2011 have seen a total of 6925 edits. This page is watched by 2329 people who maintain a constant vigil over changes made, was viewed 1,313,608 times in December 2010 and was the 39th most viewed page on the English Wikipedia.

The India page is now available in 216 languages, has been a featured article in 9 languages and is linked to from over 1500 other pages. A reasonable inference to make is that interest in Wikipedia in India broadly corresponded with the time-line for the evolution of the India page and Wikipedia is now available in over 20 Indian languages with a further 20 Indic languages in incubation.

However, India and Indian language Wikipedias seem woefully under-represented when one compares the size of the pool of native language speakers with the number of articles on each respective language Wikipedia. Further, it is worth noting that the Wikipedia community in India is necessarily very different from similar communities across the world because of the diverse languages that are a part of the Indian identity. In terms of size, Hindi is the largest Indian language Wikipedia with Telugu, Marathi, Bishnupriya Manipuri and Tamil making up the top five Indian languages Wikipedias, though none of them have over 100,000 articles; Hindi being the biggest with around 67,221 articles. The first Hindi article was begun in July 2003 and the Hindi Wikipedia crossed 1000 articles in September 2005, the first Telugu article in December 2003 and Telugu Wikipedia crossed 1000 articles in October 2005, the first Marathi article in May 2003 and Marathi Wikipedia crossed 1000 articles in May 2005, the first Bishnupriya Manipuri article in August 2006 and Bishnupriya Manipuri Wikipedia crossed 1000 articles in November 2006 and the first Tamil article in September 2003 and Tamil Wikipedia crossed 1000 articles in August 2005. However, Oriya, Punjabi, Assamese and Malayalam were the first Indian language Wikipedias and were all started in 2002.

Writing in the September 2010 edition of the Wikimedia India Newsletter, Shiju Alex and Achal Prabhala opine that:

“Indians working on English Wikipedia form perhaps the most active Wikimedia community in the country. This might be surprising for many people outside India, but within, it is fairly obvious that English is an important Indian language (it is one of India’s “official” languages) and also the most significant bridging language between different language groups. Indeed, English is the language that connects Wikimedians from various language groups in India. What we call the “mother tongue” (i.e. the native Indian language of one’s parents) is usually not English, and yet for a number of people, English remains the preferred operating language in educational, professional and online life.”

Given the varied language communities in India, it is worth noting that several language communities have been very active and have been a primary factor in driving editorship in their respective languages. Common to all these language communities are outreach activities with a growing number of regular meet-ups across the country (Bangalore has had 23 consecutive community meet-ups since July 2009 with one being held every month.), Wiki Academies (hands on tutorial sessions on how to edit Wikipedia) and other such outreach processes that are very important to evangelize Wikipedia projects and bring new editors in to the fold. In parallel, there has slowly been traction from Governments as well. The Malayalam Wikipedia community recently released an offline version of Malayalam Wikipedia containing 500 selected articles and was distributed by the Kerala government to thousands of schools in the State. The Tamil Nadu government recently released a glossary of thousands of technical terms that were collected by the Tamil Virtual University for use in the Tamil Wiktionary project and also organized an article competition across the State covering over 3000 universities and colleges, which has introduced Wikipedia to a very large new audience and brought new editors into the fold. It is also worth noting that the National Knowledge Commission recognizing the importance of free, easy and open access to knowledge had in its recommendations on Open Educational Resources noted that:

“Our success in the knowledge economy hinges to a large extent on upgrading the quality of, and enhancing the access to, education. One of the most effective ways of achieving this would be to stimulate the development and dissemination of quality Open Access (OA) materials and Open Educational Resources (OER) through broadband internet connectivity. This would facilitate easy and widespread access to high quality educational resources and drastically improve the teaching paradigm for all our students.”

This is important because Wikipedia and its sister projects are some of the largest repositories of Open Educational Resources in the world.

In a case study on the history of the Tamil Wikipedia L.BalaSundaraRaman traces the history of the Tamil Wikipedia:

“Tamil Wikipedia was started on September 30, 2003 by an anonymous person by posting a link to their Yahoo! Group and the text manitha maembaadu, fittingly, a phrase that means human development, on the main page. However, for several weeks after that, the site had an all-English interface with little activity. Mayooranathan, in response to a request posted in a mailing list, completed 95% of the localisation between November 4, 2003 and November 22, 2003. He made some anonymous edits alongside. On November 12, 2003 Amala Singh from the United Kingdom wrote the first article in Tamil, but with an English title Shirin Ebadi.The earliest editor who continues to edit actively, Mayooranathan, has written more than 2760 articles and has kept the project alive during an intervening period when practically nobody else was editing. Around five active editors including the author joined the project in the second half of 2004. Some occasional editors turned out to become regular editors and the Wiki started growing steadily. Bugs were reported to fix the interface, policies partially deriving from the English Wikipedia were initiated, and editors started to specialise in tasks like stub sorting, creating templates, copyediting, wikifying, translation, original writing etc. Even at this early stage, the Tamil Wikipedia had a global editorial team representing almost every continent. After registering a period of high linear growth in several metrics on a lower base, the Tamil Wikipedia started witnessing, around April 2007, a low linear growth on a higher base in several quantitative
metrics. This period, however, also showed a perceivably super-linear growth in article quality aspects like length, standard of prose, image use, inline citation usage, etc. Late 2008 to early 2009 was a period characterised by a near constant number of active and very active editors, a steady influx of new and occasional editors, a healthy, enthusiastic and continuity-preserving churn, and, above all, optimism for a promising future.”

There have also been some technical challenges around the historical lack of growth in Indic language Wikipedias, in particular in the area of openly licensed and freely available Indic fonts, difficulties with the cross-platform display of Indic text and the lack of standardised cross platform Indic language text entry tools. There have been and continue to be many approaches to working on these problems – it is a focus of the Wikimedia Foundation, of language communities and private organisations. Google and Microsoft have both released tools to help solve these challenges and assist in translation efforts.

This inequitable distribution of content by languages, skewed towards English and languages of the traditional geographies of the Global North, has been a frequent point of discussion and has been a point of focus for the Wikimedia Foundation. Among other things, the Foundation’s strategy plan aims to foster the growth of smaller Wikipedias – by 2015, the aim is to have 100 Wikipedia language versions with more than a 120,000 “significant articles” each. To this end, the Foundation also aims to bootstrap community programs in key geographies: India, Brazil, the Middle East/North Africa. In particular, Achal Prabhala, a member of the Wikimedia Advisory Board, has spoken about the need for local representative bodies of the Wikimedia projects, or Chapters, in countries which are linguistically under-represented. He argues that that there is a distinct relationship between local growth and the existence of local Chapters and that geographies in the South present an enormous opportunity for growth.

Wikimedia Foundation’s India Chapter has had a long history. First efforts to set up a Chapter began in September 2004 with an Internet Relay Chat meeting and efforts continued through to November 2007 when there was another round of discussions on the India mailing list and draft bye-laws were drawn up. However, the efforts to set up an India Chapter received a huge boost with two things – Sue Gardner and Jimmy Wales visiting Bangalore in December 2008 and regular Wiki-meetups in Bangalore that were made possible by the Centre for Internet and Society. In July 2009, renewed discussions and activity commenced in connection with the setting up of the India Chapter and this culminated with India becoming the 29th chapter of the Wikimedia Foundation in July 2010. The Wikimedia India Chapter was granted registration (registered name: Wikimedia Chapter) by the Registrar of Societies, Bangalore Urban District on the 3rd of January 2011.

The Chapter’s fundamental mission is to catalyse the usage and editorship of Wikipedia in India, as well as foster Indic language content. To this end, there are multiple tracks the Chapter will need to take – Content, Technology, Outreach, Collaborations, Offline Work, creating Special Interest Groups and Projects.

The Wikimedia Foundation, recognizing the importance of India to its growth strategy and understanding the potential in this relatively under-represented and untapped market, recently appointed Bishakha Datta as a member of its board of trustees and has announced that it will soon open its first office outside of the United States in India. As a testament to the growing popularity of Wikipedia in India, the 15th of January 2011 saw over 90 concurrent events celebrating the tenth anniversary of Wikipedia across India, many of them being organized spontaneously by small groups of interested community volunteers, with large local participation and substantial media coverage.

Aside from the organic growth of Wikipedia and local language communities, the development of Wikipeda in India would appear to be only just entering its active growth phase. With the continued growth of the Indian economy, the expected growth of Indian internet users, the advent of cheap and ubiquitous wireless internet access, an active Chapter, a Foundation office in India and the support of India’s relatively free media, the future of Wikipedia in India looks bright and well set for the decade ahead.

____

(The author would like to thank the team that put together the Wikimedia India Community Newsletter in September, 2010, which is available here: http://commons.wikimedia.org/wiki/File:Wikimedia_India_Community_Newsletter_2010_September.pdf. This is the best overview of the state of Wikimedia and Wikipedia projects in India and is well worth reading. This current piece would not have been possible without this Newsletter. The case study on the Tamil Wikipedia can be read in its entirety here: http://ta.wikipedia.org/wiki/%E0%AE%B5%E0%AE%BF%E0%AE%95%E0%AF%8D%E0%AE%95%E0%AE%BF%E0%AE%AA%E0%AF%8D%E0%AE%AA%E0%AF%80%E0%AE%9F%E0%AE%BF%E0%AE%AF%E0%AE%BE:Tamil_Wikipedia:_A_Case_Study)

January 28, 2011
Thoughts on Creative Commons Licenses

At Pratham Books, we have a very simple mission – “A Book in Every Child’s Hand” and this drives all of our work and we constantly test what we do against this goal. The mission has two parts, one is to create more reading matter such that there is more available for children to read and the second really is a corollary – that we need to be able to get books to where children need it the most and that the books need to be culturally and linguistically relevant as well.

This is where our challenge lies – to massively scale the production of high quality, low-cost children’s books for a massively multi-lingual and multi-cultural market. Looking at this challenge it is fairly obvious that this is not a problem that any one organization can solve. The solution has to be scalable, flexible and catalyse our fundamental mission as well.

At this point, we realised that there were several internal questions to answer and some of them painfully introspective. Questions as to whether the books we create and distribute have to be a Pratham Book, whether it implied that every book must be paid for by either the reader or an intermediary and, from being a publisher, questions as to whether we are gatekeepers of content or content curators, how we could create infinite good with finite time and resources and most importantly, how we can create more value than we capture?

Having answered most of these questions using “openness” (whereby, we asked ourselves whether allowing unrestricted access to use and re-use our content furthered our mission) as a test and finding that it did fit our mission, the second set of questions to answer was more technical – how, as a small non-profit, do we do this and not find ourselves overwhelmed. It was at this point that we had a moment of realization – that reading is an extremely social activity and that there are communities and organizations who were more than ready to help us achieve our goals.

It was at this juncture that we hit upon the Creative Commons licensing model as one that would help us achieve many of our aims of flexibility, scalability and being able to help catalyse our mission of a book in every child’s hands. In particular, three things stood out – a shared value system of sharing and openness, a community that was deeply embedded in these ideals and, from our perspective, it was scalable because it allowed us to license content to multiple organizations and individuals, both known and unknown, with a one time effort of releasing them under a Creative Commons license as opposed to the traditional model which involves time consuming negotiations and discussions with each known organization or individual who wants to use our content.

As an organization, we did spend some time choosing a license and, from our perspective, a choice between openness and sharing which reduced to a choice between the Attribution and Attribution-Share-Alike license. We have decided that the Attribution license will be our default license with a fall-back to the Attribution-Share-Alike license in cases where needed. It is best said by P2PU “it emerged that our choice lay between two licences: Creative Commons Attribution and Creative Commons Attribution Share Alike …chose to use Creative Commons licences because Creative Commons have become a global standard and are supported by a large international community. Both licences are Free Culture licences and are more permissive than any of the other Creative Commons licences . In other words, the choice was not between two extremes but between two open licences at the same end of the licence spectrum.” Given that our goal was being as open as possible, it followed that our license choices were essentially around licenses that allowed for the greatest possible use and re-use because our initial hypothesis was, and continues to be, that being open allows us to fulfil our mission better than a traditional copyright model allows.

We now use Creative Commons licenses everywhere! We license entire books under CC-BY and CC-BY-SA licenses, we license our illustrations similarly and even photographs and other publicity material too. Over the last year we have been building the foundations for a social publishing model – where we curate communities that are passionate about reading and help us create content. Such a model rests on the idea of a participatory culture and an essential ingredient is a permissive licensing strategy – Creative Commons licenses offers us this, a large community with shared values and an ecosystem to tap in to.

While this licensing and publishing model works well in theory, it has been extremely heartening for us to see it come to life – our communities have created multiple derivative works ranging from iPad and iPhone applications, to porting our works to OLPC laptops, to creating entirely new books from existing illustrations and, my personal favourite, creating versions of our books for the print impaired – from DAISY and Braille books to rich audio books such that our mission truly does encompass every single child.

I firmly believe that we would not have been able to achieve what success we have had without the help of Creative Commons licensing. These licenses and the values that they stand for are vital to building and strengthening a digital commons from which we all benefit. I hope you will consider supporting Creative Commons and licensing content that you own or control such that we all benefit from the growth of the commons.

[Cross posted over at the Creative Commons site]

November 4, 2010
Thoughts on Corporate Usage of User Generated Content

Sumant Srivathsan and I have a running conversation over corporations soliciting advertising and content generation by customers without compensation and we’ve written up our thoughts on this and the current TATA Docomo campaign.

[Gautam] I firmly subscribe to what Cory Doctrow has said, that an artist’s greatest worry should be obscurity and not piracy or, in this case, a lack of suitable monetary compensation. While on the subject of monetary compensation, I also believe that attention is a valuable commodity in its own right that can be monetised and is a very important element in any modern day business model.

[Sumant] I don’t have a problem with crowdsourcing content in general, and I concur that attention helps to increase the value of future work. However, what it creates is a never-ending spiral that consistently under-rewards creative work, but tacking on the “future reward” rider. I have a problem with this, because it allows the buyer to artificially create a position of strength and dictate the value of products that are not part of the current transaction.

Our current discussion is over TATA Docomo’s Celebration campaign where they encourage people to submit creative entries for Diwali as animations, wall papers and screen savers.

The relevant terms are below:

Rules for CREATE CELEBRATIONS:

12. The winning entry(s) will become a copyright of TTSL and TTSL will have all rights to use the winning entry (s) in whichever forums, media and frequency it deems appropriate, without any financial or any other obligation whatsoever.

http://twitpic.com/2wqggm

Terms & Conditions of the CREATE CELEBRATIONS:

2. Participant acknowledges that all the copyrights rights and intellectual property rights of an entry(s), post its submission would entirely and solely vest with Tata Teleservices Limited (“TTSL”) only. By submitting the entry(s) for the Contest, the participant agrees to assign and transfer all copyrights of the submitted entry to TTSL without any consideration and compensation.

http://twitpic.com/2wqgli

In short, all entries and the copyrights in and to them belong to TATA Docomo and they will pay the winners in each category. However, the Rules for the CREATE CELEBRATIONS seem to suggest that only the winning entries become the copyright of Tata Docomo but the Terms & Conditions seem to suggest that all submissions and entries become the copyright of Tata Docomo.

[Gautam] I am not a fan of TATA Docomo asking for an effective assignment of copyright to them and a model I would recommend is that they use a more permissive license, such as a Creative Commons license, to allow the artists to keep copyright and be able to share it on their own terms.

[Sumant] TATA DoCoMo should not be in a position to dictate what an artist does with future work, which may not exist, because that should be entirely the artist’s prerogative. I suppose any artist who feels that way would demand a higher price for the work, and therefore not enter into a contract based on TATA DoCoMo’s terms. Insofar as their terms and conditions are clearly stated, I have no problem with TATA DoCoMo’s offer to contributors.

I suppose TATA DoCoMo is in a unique situation because they are allowing use of their trademarks for use in the UGC. I still think that the ideal outcome ought to be a sharing of rights, and not outright transfer to TATA DoCoMo. I’m not happy with the prizes being offered, which is a fraction of the cost of the video being done by an agency, as this devalues the contributor’s work, and I’m not happy with the attribution. It says “this is an idea of so-and-so” whereas the contribution is completed animation/artwork. That is misrepresentation.

[Gautam] However, at the end of the day, it’s a free world and a free market and if TATA Docomo has many takers, as I am sure they will, they do so knowing full well of the terms of contract. Caveat Emptor and all.

[Sumant] I do like the deviantART and Threadless/Inkfruit models of crowdsourcing much more. In both cases, attention is a given. In addition, there is a clearly stipulated revenue-sharing system in both of these models that allows artists not only to monetize the attention they gain, but also the work itself, to its maximum. Also, in both cases, the artist retains all rights to the content they have created, but are shared under a licensing agreement. dA uses a CC license, but I’m not sure about Threadless or Inkfruit.

UPDATE I:

TATA Docomo have now changed their Rules to state

12. Participant acknowledges that all the copyrights rights and intellectual property rights of an entry(s), post its submission would entirely and solely vest with Tata Teleservices Limited (“TTSL”) only. By submitting the entry(s) for the Contest, the participant agrees to assign and transfer all copyrights of the submitted entry to TTSL without any consideration and compensation.

It’s clear now that all submissions and not just winning entries have their Copyright assigned to TATA Docomo.

UPDATE II:

Nandita Saikia has written a post well worth reading on the legal elements of contract and copyright involved in such cases and why such clauses might not hold up under legal scrutiny.

What are the legalities involved in not only announcing that one owns the property of another person but in doing so while being factually accurate? In fact, going a step further, would it be legally possible to effectively decide that one owns the property of another person without paying that person anything?

One context in which such questions could arise is in the context of User Generated Content (UGC) which is submitted to websites. Such sites which solicit/accept submissions from users often have, somewhere in the fineprint of their Terms, clauses which state that ‘(1) all submissions shall be the exclusive property of the website, (2) the submitting-artist/user (hereinafter referred to as the “artist”) shall have no right to his/her submission/”work” or therein or to any profits which may be derived therefrom, and (3) the website may deal with the work in any manner it chooses without any reference to the artist, and without making any payment to the artist’. Some websites may acknowledge that the work once belonged to the artist and draft clauses in the form of an “assignment” of copyr
ight or “transfer” of the work.

Read her full piece over on the Indian Copyright blog.

October 16, 2010
Thoughts at the Intersection of Publishing and Technology in India

(With thanks to Atul Chitnis, Sankarshan, Abhaya Agarwal and Sachin Malhan for their inputs.)

The Indian publishing space is both vibrant and flourishing with estimates suggesting that there are sixteen thousand publishers who publish, for the most part in printed form, some eighty thousand works a year. What this statistic does not show is how varied the market is across multiple metrics. Linguistically, as a country, India has twenty two languages listed in its Constitution and are considered ‘official languages, the 2001 census lists twenty nine languages as having more than one million native speakers. From a technological viewpoint, India has over six hundred and fifty million mobile phone subscribers and over eighty million internet subscribers while at the same time a recent survey indicated that eighty four percent of rural India are unaware of the Internet. And yet, India is the fifth largest viewer of the English Wikipedia and ninety four percent of Indian traffic to Wikipedia is to the English version of Wikipedia while the second largest language group for India after English is Hindi with just over half a percent of the total. These statistics indicate both large challenges and equally large opportunities for both domestic and international publishers in India.

The Indian publishing market has only just begun to enter its digital age in its deployment of technology and in the development of business models that technology enables. It is also curious to see the marked absence of big international players in India given the remarkable opportunities that this market offers and this gap in the market has been filled by numerous local players who now have a marked first-mover advantage. Looking at this space, one can separate the publishing market in to three elements, of content creation, of content distribution and of overarching publishing business models and there are several interesting established and emerging models within India that are technology centric in their approach and worth exploring.

India’s linguistic diversity is mirrored in the works so published and one particularly onerous problem is an abject lack of diverse “print ready” Indic OpenType fonts that are Unicode compliant and this problem is further exacerbated by the lack of adoption of Unicode in publishing work-flows. Such lack of standards compliant fonts are a serious drawback to technological attempts at publishing existing works in new mediums as it adds to the cost and time of conversion. A further issue is that optical character recognition of Indic scripts is far more complex than it is in Romanic languages because of the multiple variables involved. Both the Government and private sector have sought to address these challenges – The Indian Institute of Science, based in Bangalore, is part of the Million Book Project and has an ambitious goal of digitizing two hundred and fifty million pages of mostly out of copyright books in multiple Indian languages and have scanned over eighty million pages already. Their stated aim is two fold, to ensure access to books that might otherwise have not been possible and to develop optical character recognition and language processing tools better suited to Indic languages. There have also been private successes at digitizing content and multiple Indian publishers work with Google Books to digitize and make available their content to varying degrees. The Government has also worked to create Indic OpenType fonts that are Unicode compliant but there are open issues as to licensing and they have not yet seen widespread adoption.

Given the prevalent challenges around creation of standards compliant content, compounded by low internet penetration rates and by the relatively weak purchasing power of Indian consumers, it is not surprising to note the eBooks have not yet had widespread adoption and will not see widespread adoption in the immediate future. Amazon’s Kindle, while officially available in India, does not support Indic languages natively and this drawback, along with its high price point, have been hindrances to it’s adoption. Similarly, Apple’s iPad is not officially available in India and has a similar barrier to entry – an extremely high price. However, the domestic marketplace has birthed competitors who have addressed these two weak-points. Infibeam’s Pi and EC Media’s Wink eBook readers seek to offer viable alternatives to international offerings and they distinguish themselves as being cheaper alternatives and that they support major Indic languages natively. That said, they do not have a selection of titles, in English, that is in any way a threat to Amazon’s supremacy in the English publishing space. However, having earlier noted the challenges of standards compliant Indic content, it is safe to say that these domestic players do not yet have a compelling selection of Indic language content either and apart from the technological limitations, the most important constraints on eBooks and eBook readers in India continues to be high price points for the devices and for the eBooks themselves.

That said, it is again, the Government that seems keen to play a catalytic role in this space for the stated purpose of using technology as a way to scale content dissemination and interactivity for education. In this context it is also worth observing that while many Government prescribed textbooks are officially available online to download, for free, there remains a lack of digital channels and low-cost hardware devices to distribute them this content. Over the last year, the Government has announced multiple efforts to develop low cost multi-function devices that offer a tablet-like experience at a cost much lower than existing alternatives. Some of these attempts have been met with some well-deserved scepticism in the market but it is safe to say that the Government is intent on pursuing this path and while India has not officially adopted the One Laptop Per Child project, there have been several private initiatives to roll these devices out across limited geographies. Given these developments and the Government’s involvement, a logical opportunity that arises is content for such devices when they are eventually rolled out and there is some hope that this large opportunity will spur the development of Unicode compliant Indic fonts and publishing work-flows.

A further opportunity lies in the mobile phone – eBook reader hybrid device – while traditional designs eBook reader designs emphasize form and function, the steady march of falling mobile phone prices, increased screen space and enhanced hardware along with an impending roll-out of 3G services across the country afford an opportunity to the nimble publisher to offer eBooks and similar digital content to be read and consumed on these hybrid devices. While these devices may not resemble eBook readers we are currently accustomed to, the peculiarities of the Indian market often forces unique and innovative, or jugaad, solutions. Indeed, it is evident that some players have already spotted this opportunity – Wattpad, a Toronto-based mobile phone e-book developer, announced that it has reached an agreement to release its mobile phone eBook application on Bharti Airtel, India’s largest mobile phone network.

A small, but growing trend, is that of self-publishing and print-on-demand publishing in India. Two of the earliest entrants in this space were Pothi.com, a Bangalore based start-up, which started operations in 2008 and the company’s aim is to democratise publishing making it accessible to a wider audience using a demand backed publishing platform. They also offer ancillary publishing tools, service
s and an online store to help sell the books in both printed and digital formats. CinnamonTeal is another such player in the space. Perhaps, having recognized the growing interest in self-expression via the self-publishing route, these start-ups have been joined by Depot, which is the books and stationery retail arm of one of India’s largest retails groups, the Future Group. It is interesting to note that Wattpad has also announced that it plans to work with one of the larger more established international players in the self-publishing and print-on-demand space, Lulu.com.

Indeed, a number of local publishers are already available on mobile phones and the content has a strong focus on entertainment rather than education and this holds true across mobile phone providers. At the simplest level, Reliance Communications, a mobile phone operator, launched a novel by an award winning author that was packaged and made available as a ninety short-messages or text messages that were delivered over the course of a month. They plan to convert this to an audio book as well that is accessible via a dial in number. At the next level of technological sophistication are options that leverage existing phones with data connections to provide an enhanced reading experience. Tata Docomo provides comics across a wide range – spanning mythological tales to content from Disney via a reader that offers a comparable experience similar to Amazon’s Kindle and is priced at INR 20 per comic book and is valid for one year. A company called Nazara Technologies is bringing Archie comics in Indic languages to two mobile phone operators and is in talks with other operators too. Amar Chitra Katha, one of the oldest comic book series in India that is still being published, is also available via Vodafone’s network and offers value added content based on these comics and stories such as being able to listen to folk stories, download ring tones, wallpapers and even games. Then there are publishers who have used technological platforms that are not yet widespread in India but have done so with an international market in mind. Tulika Books, for example, a children’s book publishers, have multilingual, interactive eBook applications for the iPhone and iPad while Amar Chitra Katha, that are available domestically via Vodafone, are also available on the iTunes App Store.

With rising levels of disposable income and a growing number of people online, online retail of books and allied printed works have grown quite dramatically in India. The absence of global heavyweights such as Amazon and Barnes and Nobles, while arguable evidence of a still-nascent market, has given rise to numerous domestic players in this space – from traditional brick-and-mortar stores that have expanded to the eCommerce space, such as Landmark On the Net and the Oxford Bookstore, to players who are only present online such as Indiaplaza, A1Books and FlipKart. FlipKart, in particular, has built a very strong and loyal customer base that rivals traditional brick-and-mortar stores.

While several Indian publishers have embraced technology to expand channels of distribution, and this trend will continue given the data points on costs of devices and wireless connectivity, there are also some rather exciting new business models that are emerging in the Indian market.

ACK Media, which holds the rights to Amar Chitra Katha comics that were previously discussed, are at the forefront of this evolving business models for publishers. They seek to provide “high-quality entertainment and educational experiences for children of all ages” and are now present across multiple platforms. They are, as has already been seen, present on a range of mobile phone platforms and have further diversified in to home videos, magazines, audio-books, TV and film and even game platforms with the launch of a game called Legend of Katha, a massively multi-player online role-playing game. They have been able to reinvent and re-purpose content forms and formats to expand their traditional offering to both existing and new consumers.

Panther Publishers have traditionally published print media for the health, pharmaceutical and medical education markets and they have evolved an interesting, yet niche, model that supplements their printed works with multimedia discs that offer interactive content about diseases and other medical conditions while ensuring high levels of quality and accuracy. This model does not supplant traditional printed media that they publish but enhances it with content and methods of exploration that print does not allow for and is a value addition to their catalogue of printed books.

Pratham Books, a non-profit children’s book publisher with a mission of “A Book in Every Child’s Hand” has evolved a social publishing model that includes strong social media elements to build a community around their cause and employs Creative Commons licenses (or what are referred to as participatory licenses) to allow this community to reuse and re-mix their content. This community led model has allowed Pratham Books to side-step existing constraints and enabled them to scale the co-creation of high quality, low-cost children’s content and created new channels of distribution for a multi-lingual and multi-cultural market. While strengthening their fundamental mission, this model has also catalysed the creation of multiple derivative works ranging from iPad and iPhone applications, to creating content for OLPC laptops, to creating entirely new books from existing illustrations, to creating community translations of their books and to creating versions of their books for the print impaired – from DAISY and Braille books to rich audio books.

However, an important element of this gradual shift and evolution from the traditional paper-print model to digitally enabled models is the legal protection that the Copyright regime offers in India. An amendment to the Indian Copyright Act, currently pending approval, introduces a few key elements to digital content spaces – amongst the many changes it contemplates three worth highlighting are that it seeks to introduce the notion of “Rights Management Information” and “Technological Protection Measures” or Digital Rights Management to the Indian legal context.

Aside from these two, an important exception being contemplated is to the benefit of those “persons suffering from a visual, aural or other disability that prevents their enjoyment of such works in their normal format.” This is important because India alone has close to 80 million persons with print-impairment i.e. those who cannot ‘read’ content that is printed in books or visually represented on the screen and hence the need for content in formats that is accessible to the print-impaired – usually audio, or text that can be read out by software. Unfortunately, there is far too little ‘accessible’ content in such formats, and efforts to expand the universe of such content are too few and far between. Government and non-profit and non-governmental initiatives to create or aggregate accessible content attract attention disproportionate to impact. This clause, while still debated as to its method and scope of operation, is seen as being able to spur and support large scale efforts to create accessible catalogues of content. While the print-impaired are under-served it also offers the possibility of a unique marketplace and is an area where organizations such as Inclusive Planet operate to offer community, library, content sharing and accessible solutions specific to this demographic.

Extrapolating current trends in the pace of technological improvements, falling prices of content consumption devices and increasing access to low-cost, reliable high speed wireless networks offers conditions favourable to publishers to expand channels of distribution and consumption of content beyond what print has traditionally offered and lays the foundation for new business models that use technology to deliver increased value and scale. It is also worth noting that infrastructure to support many of these processes, including varied publishing services, content conversion and allied services, already exists in India even if not targeted at the local market. Local companies, such as FlipLog, are already leveraging local content, talent and partners to target markets both domestic and international across multiple technology platforms.

It is easy to see the tremendous opportunities that the Indian market offers to publishers across the spectrum and it is important to remember that while Indian might not yet be a technologically advanced market in the way that Western markets have developed there remains immense opportunity for technological innovations in content creation, delivery and consumption as long as the technology is appropriate for existing conditions and offers value in a price sensitive market.

September 30, 2010
Thoughts on Wikipedia and its Language Challenges

This was written soon after Wikimania 2010 for the New Indian Express.

(With inputs from Arun Ram and Wikimania 2010 attendee, Srinivas Gunta)

A common point of discussion in matters regarding the global Internet is the somewhat inequitable distribution of content by languages with a skew towards English and languages of the traditional geographies of the Global North. Wikipedia is not immune to these inequalities either and this was a major point of discussion at the 2010 edition of Wikimania which recently concluded at Gdansk in Poland. Wikimania is an annual gathering, organized by the Wimikedia Foundation, of Wikipedians, as those who contribute to Wikipedia are called, who meet to discuss the state of various Wikipedia projects and to chart a course for the year ahead.

What stood out was the scale at which the Wikimedia Foundation is thinking. Its strategy plan aims to increase reach to 680 million unique visitors globally by 2015 (from the current 388 million). The aim is to achieve a 12% annual growth in the Global South, and 4% annual growth in the Global North; in other words most of the growth will be in Wikipedias in the multiple languages of the Global South.

Jimmy Wales’ keynote address at Wikimania this year focused on countries of the Global South and he did video interviews with active Wikipedians from the Bangla and Tamil Wikipedia buttressing the importance of the Foundation focusing on the smaller languages and varied geographies as represented within Wikpedia projects. Among other things, the Foundation’s strategy plan aims to foster the growth of smaller Wikipedias – by 2015, the aim is to have 100 Wikipedia language versions with more than 120 thousand “significant articles” each. To this end, the Foundation also aims to bootstrap community programs in key geographies: India, Brazil, the Middle East/North Africa.

Two presentations highlighted the challenges and the possibilities ahead. Achal Prabhala, a Wikimedia Advisory Board Member, spoke about the need for local representative bodies of the Wikimedia projects, or Chapters, in countries which were linguistically underrepresented. Achal’s larger point is that there is a distinct relationship between local growth and the existence of local chapters and that geographies in the South present enormous prospects for growth. They also present prospects for an increase in scope – which could mean, in turn, new ways for Wikimedia to grow the world over. On a cautionary note, Harel, from Wikipedia Israel, spoke of his experiences, that have been contrary to expectations, where local Wikimedia Chapters may find themselves in adversarial relationships with local Wikipedian communities and that there if often a trust deficit between the two sides. Harel spoke of the need for local chapters to treat editing communities as peers and equals. Chapters are meant to do outreach, he cautioned, while editing is the preserve of the community and that this is something that the community must be left to do without chapter interference.

Given this inequitable distribution of linguistic content within Wikipedia projects, external organizations have seen this as a possible gap to fill and there were presentations on translation toolkits and machine translations of content to populate otherwise sparse language Wikipedias. This is a route that has met with some resistance. An example is a translation toolkit that Google had introduced to blend both computer aided, or machine, translation with human translation. Users of this tool have been translating popular English language articles in to various local languages with a varying degree of success. However, ironically, the size of existing active user base in each of these Wikipedias may itself determine how successful these efforts will be. Translators using the tool needed lot of hand-holding and overseeing and after initial hiccups, Tamil Wikipedia has been able to engage with Google such that their contributions now fulfil quality parameters too, thanks to availability of more active users.

It is interesting to see how multiple approaches are being deployed to solve one common problem – a lack of linguistic diversity that matches the proportion of Internet users online. It is to be expected that there are some tensions between organic community lead translation efforts and efforts that are focused on automated translation and Wikimania provided both sides a venue at which to engage with each other to resolve their differences and work collaboratively.

Here’s what needs to be kept in mind while steaming ahead in India: English, with 225 million speakers in India, is also an Indian language. Several Indian editors already contribute to the English Wikipedia. So the emphasis needs to be on boosting contributions in all Indian languages, including English – rather than just an ‘Indic languages’ vs ‘English’ paradigm. Innovative ways to boost edits – and bring in new editors – include holding Wikipedia academies across the country; finding low-cost ways to create public access to Wikipedias in places like public libraries and removing technological obstacles related to scripts, keyboards etc.

With the Foundation’s new thrust on the creation of local Chapters and with the India Chapter in the final stages, one can expect a greater
deal of focus on these issues both within India and other under-represented areas of the world.

September 15, 2010
Thoughts on the Future of Journalism and the Print Media

Newspapers and the wider print media have traditionally formed an important bulwark against excessive intrusions of the State, the Fourth Pillar, as they have traditionally been referred to. However, their business model have been built on on an artificial scarcity of pathways of information to the general public. Since newspapers owned this channel, they monetised the channel at both ends – advertisers paid to access the channel and the associated captive audience while the audience paid, upfront, for an entire year even, for access to this channel which was their limited source of news and information. The Internet destroyed this artificial scarcity of channels and of access to ‘facts’ and this is a genie that cannot be put back in the bottle. In many ways this is equivalent to the problem that the music and video industry have faced as well – how do you get consumers to pay for content that they can otherwise, illegally and legally, access for free? The print media no longer is the only channel for news and certainly isn’t the fastest channel for news – when competing against the Internet it is very hard to beat digital pathways as it is to compete against ‘free’. The Nieman Journalism Lab recently wrote that “The Dallas Morning News now gets 38 percent of its revenue from circulation, 54 percent from advertising, and 8 percent from contract printing plus [and] those numbers are a far cry from the way it used to … 80% of their revenue came from advertising and 20% came from circulation.” Which leads to the question of whether the era of and advertising and subscription funded monolithic news organization is fast ending and what this means for the traditional news organization.

The recent example with Wikileaks, “the world’s first stateless news organization” as Jay Rosen a professor of Journalism at New York University called them, distributing tens of thousands of pieces of classified information from America’s war in Afghanistan points to a future where newspapers are not the first port of call for whistle-blowers and hence may no longer be sources of facts, and to an interesting model of value addition for journalists and traditional print media. Jeff Jarvis, an associate professor at the City University of New York’s Graduate School of Journalism, believes that this value addition is what journalists and media organizations can add to facts that are otherwise both free and freely available and says that “Thanks to the internet, the marginal cost of sharing information today is zero [and] this change in market reality forces us to examine journalists’ true value to the public in the market.”

Given the proliferation of channels of news and facts and that most of them are both free and freely accessible, there seems to be little value in ‘news’ and ‘facts’ as mere reproductions of events but this poses a challenge to the reading public as well – a challenge of an over-abundance of sources of news and facts without the ability to filter them, rank them or contextualise them. Another possible role that journalists and the print media could play is as filter to these multiple incoming sources of facts and news – to build filters of authenticity and to add context to these facts. That said it probable that such fact checking and verification will, in the future, be crowd-sourced as Truthsquad, a “community fact-checking experiment” and SwiftRiver a “free and open source software platform that uses algorithms and crowd-sourcing to validate and filter news” show. A necessary ingredient to building such filters of authenticity is trust and this is something journalists and the print media should keep in mind – undermining the element of trust is to undermine your relevance and future business models. The current brouhaha over the paid news syndrome is a malaise that will render those sources untrustworthy and without a necessary ingredient to build future models of sustainability.

Which brings us to the an emerging trend of data driven journalism. Governments across the world, with the United States and the United Kingdom taking the lead, have begun to disgorge vast quantities of hitherto unavailable data in to the public domain and as the Wikileaks example shows, this is an opportunity for print media to add value and context to such data and weave a narrative that data, as a standalone object, lacks or as the Nieman Journalism Lab put it, “… data in the service of somehow getting to the “big picture” about what’s really going on in the world”.

It remains to be seen, and we remain sceptical, whether placing news content behind walls, for which payment is required to access, will offset advertising and subscription revenue losses and a nuanced approach where there is greater perceived value to the end consumer of such news is likely to succeed better than a simple pay-to-view model. The Guardian has been experimenting with a very interesting platform based model that they are calling The Open Platform. This is, in their own words, “… a suite of services that enables partners to build applications with the Guardian.” The long term goal and vision of this project is to embed the Guardian as an elemental part of the Internet rather than be only a destination with the attendant risks that being a destination has. This Open Platform “… aims to make the Guardian a useful resource to partners all around the globe who want to leverage the value the Guardian can bring to their business.”

Mike Masnick has been thinking this through the challenges that traditional media face from the proliferation of digital networks and at a recent event (wonderfully titled Techdirt Saves Journalism) distilled a set of ideas that the print media could experiment with. In short, he writes that media must, mine the data to find the relevant, elevate their writers, create a platform for their community, think about multiple revenue streams, expand their brands and absorb changing ideas about “news” and its traditional notions of production.

India, of course, isn’t yet here because of a lack of ubiquitous digital networks but this will change rapidly with the roll-out of 3G wireless networks and rapidly falling handset prices. At which point it might all be too late.

August 25, 2010
The Case for a Unique Identification System in Public Education

During a visit to Hubli-Dharwad in November 2009, the local pages (Hubli-Dharwad-Belgaum) of the Times of India had the following headline: “Sky is their roof; the road their classroom – Government sanctions school without building”. The school is question was the government primary school in Ram Manohar Lohia Nagar in Hubli, with 67 students from classes 1-4 with one teacher who doubled up as the head teacher. With multiple such reports, it is not very surprising that few public institutions rival our government primary schools in public dissatisfaction – and all along we have been making significant investments on schools. It is estimated that in Karnataka we spend about Rs 6,500 per child per year and, in one study done by PROOF for primary schools run by the then Bangalore Mahanagara Palike (BMP) the number exceeded Rs 10,370 per child per year2. So how do we explain this lack of performance?

For a long time, the focus has been (and continues to) on the input side, on schools rather than on schooling and the primary questions were whether children have access to a school and whether children get uniforms, books, mid-day meals, etc. As the “road-side” primary schools shows even this fails often enough. While inputs are required for any process to work, in this case it has been done at the cost of focus on the outcomes of the education system. For example, there is little or no information on the learning levels of children prior to a child’s first “public” exam (We believe there is general agreement that the internal school reports are not good indicators in most cases) when she reaches class 10 which means it is too late to make course corrections with respect to quality.

About the only data that has been available consistently in the past five years has been from the Annual Status of Education Report (ASER) and some key findings for Karnataka were that only about 39% of the children between standards 1-8 can read a standard 2 level text with the implication that around 5.3 million children in the state who are unable to read in their medium of instruction. The performance in math is even more appalling. Over 30% of the children between standards 1-8 could not recognize double digit numbers. Less than 20% of all children could do simple division and less than 30% could do simple subtraction. This means that between 10 and 11 million children cannot do simple math. Only 11% children in Standards 3 to 5 can read an English sentence ans only 35% children in Bangalore can read English.

This pathetic state of affairs threatens to ruin the lives of millions of children in Karnataka and much larger numbers across the country, and it would not be entirely out of place if we were to say that the the failure of the schools is gradually destroying democracy. The often repeated rhetoric of elementary education being a fundamental right (now further enshrined in the Right to Education Bill 2009) seems to be accompanied by an inability to make the schools work for the children. It is true that over the past ten years enrolment has increased but enrolment does not mean attendance. Further attendance does not imply learning, for in many schools across the state, pupil-teacher ratios are very high and given the fact that teacher absenteeism is greater than 25%, these ratios get further skewed against children. Single teacher schools, such as the one in the Hubli case, are common and multi-grade teaching even more so.

It is our belief that a universal and unique identification system will help in improving quality outcomes in a significant manner. What this means is that there is a need for a unique identity that is assigned to a child from birth through till the end of her education and this unique ID will help in ensuring that all her rights as a child are available to her and that she receives a quality education.

In the ICDS anganwadis, the anganwadi worker has to worry about health, nutrition and education issues covering pregnant women, lactating mothers and children from 0-6 years. Clearly the education component suffers and has to be currently supplemented from the outside. It would be meaningful if data on all children is collected from this stage onwards so that the system would be able to (a) see to health needs if the data-base is accurate and updated regularly ; (b) check for issues like learning disabilities which can be “cured” if remedial interventions are done early enough ; (c) ensure that at the appropriate age children are admitted to primary schools and that the schools are made aware of every child’s proficiencies.

In the primary school system, there is a definite need to track migration issues. For example, children may be enrolled in a rural school and during difficult times the family may migrate to urban areas for livelihood reasons – this means that the child will also been enrolled in an urban school and therefore counted twice.

In the primary school system, there is a need to track attendance of both children and teachers on a regular basis. It is not uncommon to find out when you visit a school that declared enrolment / attendance is higher than actual. Attendance for both children and teacher communities need to be tracked.

Remedial interventions are required to bring what the system calls “slow learners” to the mainstream levels. This means that we need to know who needs help and this is possible only by administering diagnostic baseline tests and logging this data on a child-by-child basis. Currently, what happens in the government’s Parihara Bodhane programme is that teachers are asked to identify “weak” children and the number of children in these initiatives is limited by the budget. Moreover, children are not tracked because this is considered to be a burden on teachers (indeed every remedial intervention is considered a burden by the teacher community). We think it is vital that within the next 3 years all children should be at mainstream levels and this will be possible through budgetary support for planned remedial interventions accompanied by teacher training and teacher support for this programme and finally, with continuous child-by-child tracking of outcomes. Once remedial efforts are completed children need to be tracked so that we can en-sure that their acquired 3R skills are not lost. Libraries are a great vehicle to track children’s proficiencies and it is important to track how many books are being borrowed by each child every month so that we know by child who is NOT borrowing and these children are vulnerable children and need attention Beyond primary school, we should be able to track children going to secondary schools or vocational schools or even colleges.

There are many spin-offs for this tracking methodology which could feed into the government budgets – one could track how effective the Mid-Day Meal Scheme is ; or outlays for innovative Government schemes like scholarships, cycles and free books distribution. And, from a management perspective, we could even track budgets of individual schools and provide decision-makers with the kind of information they need to ensure that schooling happens and that the focus is the child.

For this system to work there is definitely a need for multiple departments within government to use this – Education Department, Women& Child Development, Health Department and Labour Department – as a minimum should be users of this system and drive multiple applications and reports based on the system and the success of such an initiative predicates upon a number of applications depending on this system.

August 13, 2010
Thoughts on the Unique Identity Framework for India n??e Aadhaar

Background:

I had written this piece many months ago when the UIDAI white paper leaked. Some of the commentary might now be dated because the goal posts have changed ever since. I have consciously ignored the privacy and security aspects because I am no expert in those areas.

I also need to credit a friend, lets just call him Vikram, for this piece.

Bottom line:

Trying to do too much. The smallest intervention becomes a massive undertaking in India because of scale and organizational/adminstrative complexity, so you should scope projects as narrowly as possible. If the main purpose is to bring marginalized people into mainstream economic life, then you should focus on getting them an ID rather than on eliminating redundant verification activity or eliminating fraud, both of which can be happy by-products further down the line. Why not: Make a national ID number available to anyone who wants it, target it to the people who currently lack any form of ID, and let things evolve from there. It doesn’t have to cover everyone or be the only recognized form of ID or be real-time and state-of-the-art to do most of what you want it to do.

Rough scope evaluation:

Aim 1: Getting everyone an ID; Project component: enrollment; Advantages: brings people into economic and social life; Disadvantages: big-brother possibilities (everybody means well in the beginning)

Aim 2: Eliminating redundant verification activity; Project component: on-demand authentication; Advantages: efficiencies over time; Disadvantages: business process disruption across swaths of the economy

Aim 3: Eliminating entitlement fraud; Project component: data de-deduplication using biometrics; Advantages: helps balance sheets of government agencies; Disadvantages: alienates those who benefit from current arrangements (customers as well as government employees who abet them)

I don’t know the relative costs of the three components, but I suspect that an incremental approach to 1 with a thinned-down version of 3 would be the 80/20 solution here. On the security side, all of this boils down to persons or individuals and what they can do. Allow me think aloud here…..

Identity

– Any kind of marker that defines or demarcates a person or individual. These persons can be real, fictional, fictitious, whatever. Captain James Kirk is an identifiable individual in the world of Star Trek. Avatars on Second Life or gaming sites are identifiable individuals within those universes. Witness protection and intelligence agencies assign fictitious identities to real individuals. In the serious world of business and government, identity is about each unique existing individual having a unique identity or marker to go along with it that can be used in official business. Most people have many such markers (credit card number, passport number, social security number, tax or voter ID number, combination of name and birthdate) and some countries have one marker that is close to universal (almost everyone in the US has a social security number, for instance). In India, some have many markers and many people have no official markers at all, despite being unique individuals. Having many markers is not really a problem except in an efficiency sense. My bank identifies me by my account number, my university used to have its own 9-digit ID for me, immigration agencies track me by my passport number, etc. etc. Some of these are parasitic on my social security number (which I provided when applying for a bank account or applying to college), but many are not. And the cost (in terms of business process changes, technology investments, confusion, etc.) of getting everyone in the economy to subordinate their own identification numbers to a common national number is going to be prohibitive in any normal decision-making horizon.

Authentication

– When you claim to be some identifiable individual (the owner of some identity marker), authentication is about making sure you really are that individual. First of all, we should decide whether we care more about false negatives (people falsely claiming to be someone else and getting away with it) or false positives (people truthfully claiming to be themselves but not being believed, maybe because they don’t have the paperwork to prove it). If you try to solve both, you end up with the biggest of all possible projects and also the least likely to succeed, because the solution to one exacerbates the other and only the all-singing all-dancing perfect solution (in which all real-world difficulties are assumed away) gives the illusion of bridging the tension. If you care more about false negatives, you’ll make it harder to get a valid identity marker, and there go the poor and the marginalized. If you make it easier to get one number, you’ve made it easier to get a second. That’s why they came up with the biometrics, but for that extra bit of security, they’ve fingerprinted an entire population (don’t tell me that won’t be abused) and, I suspect, added a whole lot of processing cycles on the IT side (I imagine it’s easier to look for matches of a 9-digit number than for fingerprint matches). The problem of identity theft (rather than the creation of false or duplicate identities) doesn’t even require the extra security. A 9-digit random number is pretty secure in the sense that it’s virtually impossible to guess and only you and maybe a handful of other people know it. [I won’t even get into the problems with biometrics. Fingerprint matches are far from unique at standard levels of detail, so it’s no silver bullet, and once fingerprint identification is used for high-value financial transactions, expect a rash of de-digitization…it even rhymes with de-duplication!]

Authorization

– Once we know you’re you, authorization is about defining what you’re allowed to do or what you’re entitled to. Here, that whole aspect is (correctly) left to the individual service providers.

In general , I think there are more things under heaven and earth than are thought of in any of our philosophies and these people would do well to ponder that. IT projects always take many times more time and money to finish than bargained on at the outset, and that’s only counting the ones that kind of reach their goal. Incentive and coordination problems will cripple (or disfigure beyond recognition) any large project in a complex organization, and you’re off the scale here in both size and complexity. Politicians and public figures and academics have a built-in preference for ambitious/sexy/grandiose projects, but the efforts that stick are the ones that start small and evolve.

Some of my concerns are listed under project risks, but there’s no clue there beyond platitudes as to how they might be addressed. Project risks are side things that can derail the project if you have bad luck. The obstacles we’re talking about here are what the project (or at least this document) should be about. It’s trivial to collect data and put it in a database and then query the database from a transaction site. It’s not trivial to do it for a billion people or through hundreds of overlapping independent agencies and politically antagonistic local governments. Don’t show me diagrams of how you’re going to approach the trivial problem and then mention by the way that there might be some complications. If you have a solution to the
complications, shout it from the rooftops. Otherwise, come back when you have one or let’s talk about how we can find one.

Biometrics

– It seems like they think that the trade-off between entitlement fraud and inclusiveness can be broken by this magic technology called biometrics. If only. If there’s one thing that technology executives in large organizations agree on, it’s that the technology is never the solution. Technology providers are less wise on this point but even they acknowledge it in their less commercial moments. In any case, I don’t know how much accuracy biometrics adds beyond what you could get by triangulating the information that’s normally used in verification (biographic data as attested by documents plus distinguishing facial features). I bet it’s not much, especially once you consider the failure rate of biometrics itself (since nothing is foolproof). It does however add a layer of certain costs for infrastructure, training, etc. And the privacy implications are chilling.

Demand-driven

– really? Enrolling agencies are looking at business process disruption, technology investments, and an extra operating burden, and not only in the first ten years, so they will have to be strong-armed.

It may be that the savings in the larger economy from not having to repeat verification procedures will offset the costs over time, but businesses don’t make decisions with a ten-year horizon (especially a ten-year horizon contingent on the success of a government project of unprecedented scale and complexity), so I wouldn’t expect them to be queueing up to ditch their current procedures. People who were getting duplicate benefits will lose out under this, so don’t expect them to rush forward either. And I’m sure some tribes like being out of view of the state. (In the US, the Amish have resisted social security numbers, I believe successfully.) Also, by making more and more services/entitlements/rights dependent on the ID, you’re putting an unrealistic reliance on the benevolence and competence of the enrolling agencies. Ultimately we’re talking about hundreds of millions of vulnerable people interacting with millions or hundreds of thousands of petty officials who have been given extra work to do for the benefit of people they very likely regard with distaste. Expect the worst. Also, I don’t think “network effects” means what the authors think it means. It’s not the case that the more people that have an ID, the more beneficial it is for me to have an ID. It is the case that the more government or other services that become contingent on having an ID, the more beneficial it is for me to have one. That’s a very different thing and not so different from what they disapprovingly call a mandate (except that a mandate would be more clean).

Data quality

– why do you think it’s so easy to duplicate identities or, if you prefer, so difficult to create a unique record for each individual? Lots of overlap of names, lots of names that don’t follow the western or north indian convention of given name followed by family name, lots of people with no clean permanent or even present address, haziness around date of birth (stop the first twenty people you meet in any village and ask them if they know their exact birthday). How exactly are you going to address this? These are problems with the data themselves, not with how the data are collected. Some of these might disappear over time (three generations from now I imagine there won’t be anyone left who doesn’t know their birthdate), some of them can be nudged out of existence (we could force south indian names into a given name/surname pattern as many of us have done out of necessity), and some could possibly combated through mega-projects of their own (if we could somehow make it so that everyone who doesn’t have a proper address now has one in twenty years’ time, we would have accomplished something much grander and worthier than a national identity scheme). If you launch a national identification number without solving these problems, you’re just going to be importing a lot of bad data into an arena where it can do much more damage (because now there’s a single point of failure as far as the individual is concerned–earlier, data problems might mess up your gas connection but not your phone application or your ration card, because each agency had its own idiosyncratic way of doing things; now, everything is connected). The document refers to KYR standards for the validity of demographic data and that sounded promising, but when I looked around for information on these standards – it was like “know your customer” but for residents rather than customers, which wasn’t a big help.

Thoughts

How about this? A chunk of the population is being left out of economic life and social programs because of a lack of an accepted identity marker. Why not provide a unique government-backed ID to everyone, or to anyone who asks for it. It doesn’t have to be foolproof, just good enough for the purpose. That way if you have a usable identity marker already, you keep using that, otherwise you apply for the government’s random-number ID. Service providers now accept the UID along with what they’ve always accepted, and they’re free to pressure customers to get a UID if they like. That way, you have an order of magnitude fewer people include in this UID project and the sequence in which people are brought into the system respects your pro-poor agenda much better because it starts with people who most urgently need an identity marker and only then (in a timeline decided by individuals or at most by individual businesses or service providers) gets taken up by people for whom it would be a marginal convenience. [This is roughly how the social security number came to be the de facto national identification number in the US.]

Think of how complicated the census is. And that just involves going door to door and counting people, trying to avoid double-counting. Now you want to catalogue them uniquely and be present in every interaction they have with a service provider? Come on. I don’t know of a single large company that has a unique identity for each employee matched to an up-do-date profile of what they can do and a reliable method to ensure that someone fiddling around on the network is who they say they are and aren’t doing something they’re not supposed to. The best companies have good robust identity, authorization, and authentication for a small group of employees and the bare minimum (including bad data and processes where they can be tolerated) everywhere else, because even something as simple as rolling out a smart card to 50,000 employees can take years because of the logistical and organizational hurdles. It was a big achievement a few years ago when Johnson & Johnson figured out a way to assign unique identifiers to its 150,000 or so employees so that it could keep track of them as they moved through the company. Now maybe these companies are just stupid, but I wouldn’t bet on it. I would expect the difficulty to increase exponentially with the number of people covered or at least the number of independent decision points involved, and companies have the advantage of a command-and-control structure that democracies don’t and shouldn’t have. If the success of your project requires pretty much everyone in the economy to do things differently (“business process change” is easy to say but traumatic for anyone in the middle of it), you can assume you’ve succumbed to hubris.

August 9, 2010