Month: September 2005

OSX mv and File.renameTo() strangeness

I’ve come across an annoying behavior in OSX which I’m documenting here mostly in the hopes that anyone else struggling to track down a similar bug will find this post in Google. (This’ll probably be quite dull to non-Unix geeks…)

My original symptom:

Java’s File.renameTo command won’t work when moving files from /tmp to a user directory encrypted with FileVault.

The actual cause (near as I can tell):

  • In Darwin/OSX (and in BSD), when a file is copied or created in a new directory it automatically takes on the GID (Group ID) of the target directory.

  • A file that is renamed (using the mv command or Java’s File.renameTo) should not change its GID, even if the target directory’s GID is different.

  • The /tmp directory is set with the group “wheel,” which before OSX 10.2 users with admin privileges were in but that’s no longer the case. This means normal users may not change a file to the group “wheel” without invoking admin privileges.

So here’s what was happening. First I created a new file in /tmp. The group ID on the file was automatically set to “wheel” on creation because that’s the GID for /tmp. Moving the file to another directory on the same disk works just fine because under the hood the OS is just swapping around pointers on the disk. However, when I tried to move the file to a directory on a different virtual disk (which is how OSX thinks of FileVault), it first copies the data and then tries to change the group ID of the newly created file to “wheel,” which it doesn’t have permission to do. If I use mv to do the move I get an error message but otherwise the file is moved correctly (albeit with my own group ID instead of wheel). If I use the Java routine File.renameTo(destination) it simply returns false (failure) and refuses to do the move — I suspect it realizes it can’t do it perfectly so it doesn’t even try.

You can get the same effect just moving a file from /tmp to an external firewire drive. In the snippit below, the directory ~bug/ is on the same local disk as /tmp and /Volumes/disk2/ is a mounted firewire disk:

$ ls -ld /tmp/
drwxrwxrwt 19 root wheel 646 Sep 27 20:54 /tmp/

$ groups
bug appserveradm appserverusr admin

$ touch /tmp/test1 /tmp/test2

$ ls -l /tmp/test*
-rw-r--r-- 1 bug wheel 0 Sep 27 20:54 /tmp/test1
-rw-r--r-- 1 bug wheel 0 Sep 27 20:54 /tmp/test2

$ mv /tmp/test1 ~bug/

$ ls -l ~bug/test1
-rw-r--r-- 1 bug wheel 0 Sep 27 20:54 test1

$ mv /tmp/test2 /Volumes/disk2/
mv: /Volumes/Blackjack/test2: set owner/group (was: 502/0): Operation not permitted

$ ls -l /Volumes/disk2/test2
-rw-r--r-- 1 bug bug 0 Sep 27 20:54 /Volumes/disk2/test2

E Ink offers electronic paper display prototype kit

eink-kit.jpg

E Ink just announced it will be offering prototyping kits that include a 6″ diagonal, 170 pixels per inch, 4 gray level e-ink display. Like all E Ink displays, it only needs power to change the display, not to maintain the image. The kit also includes a development board with a 400 MHz Gumstix single-board computer as well as I/O boards for MMC, Bluetooth and USB.

No word yet on prices, though their kits page says order forms will be available soon. Kits will begin shipping November 1st.

Update 9/27/05: fixed typo (I’d said it needs power to change the display but not to update it, which makes no sense).

Update 9/29/05: As Andrew points out in the comments, they’ve now posted their order form and the kit is $3000. Not cheap, especially considering you can get your own Gumstix for $159 and a Sony Librié for $419. (You could also get a Toshiba DCT-100 for just $229, though I believe that’s using one of Kend Displays’ ChLCD display.)

What is scientific data?

Q: What is scientific data? A: Whatever the Secretary of the Interior says it is.

At least that’ll be the case if congress passes HR 3824, now headed for the floor of the House. From the bill:

The term `best available scientific data’ means scientific data, regardless of source, that are available to the Secretary at the time of a decision or action for which such data are required by this Act and that the Secretary determines are the most accurate, reliable, and relevant for use in that decision or action.

Given that this administration defines “best available scientific data” as “that data that supports the president’s life-in-a-bubble view of reality,” as a political appointee the Secretary of the Interior is probably far more qualified to judge the scientific merit of a study than any scientist.

Head Box Study

It’s experiments like this that make being a scientist all worthwhile (thanks to Bandy for the link…)

Theoretical limitation on search engines?

Lately I’ve noticed a rise in the number of Google search results that just lead to a bunch of ads plus some automatically-generated content copied from other web pages, rather than pages with the original content I’m looking for. This is the latest step in an ongoing arms race between the search engines (and their users) and so-called search engine optimization companies that try to funnel searchers through to their customer’s ad-laden sites rather than going direct to the site they want. The SEOs are essentially using Google’s own infrastructure against it, creating Google-hosted blogs, generated using content from (I’m guessing) the results of Google searches, all sprinkled with links to pages containing nothing but Google-supplied Ads.

Google’s trying to stop folks from gaming the system like this, but I expect there’s some kind of fundamental limit to what can be done to stop it. You could probably even describe it as a theorem:

For any automatically-indexed search engine of sufficient size, it is possible to construct a document that has a high page rank for a given query even though the constructed document adds no useful information beyond that which would have been returned without it.

A corollary would be:

The more complete a search engine is in terms of documents indexed, the lower the relevance of its search results will be in terms of the ratio of documents with original content vs. documents that simply copy information from other pages.

If this does, in fact, wind up being a fundamental theorem for search engines, I have a humble suggestion for what we should name it: Göögel’s Incompleteness Theorem.

Digital Storytelling Festival in SF, Oct 7-9

In a couple weeks is the Digital Storytelling Festival in San Francisco (October 7-9):

The Digital Storytelling Festival was founded in1995 as an annual gathering where professionals and enthusiasts who use technology to communicate and share stories gather to examine creative works and new concepts being used in areas of education, community building, business, personal and legacy storytelling, new media and entertainment.

The Digital Storytelling Festival is an intimate gathering that inspires its audience with new knowledge, ideas and a better understanding of how the traditional form of storytelling is changing through the use of technology.

The Festival aspires to promote and evolve the art and practice of Digital Storytelling and encourages community by the sharing of ideas and meaningful dialogue among all its participants.

Registration is $350 ($200 student). The event is sponsored by KQED Public Radio & TV and the KQED Digital Storytelling Initiative.

Thoughts on Kurzweil’s Law

I heard Ray Kurzweil speak last night at the Long Now seminar. A friend who also attended says it was essentially the exact same talk he’d heard him give five years ago (ironic considering how fast things are supposed to be changing nowadays), but this was my first time hearing him in person. I must say it’s rare where a talk makes me alternate between thinking“Well, that’s completely bogus!” and “OK, that makes sense…” so many times.

Where I think he’s got it right:

  • People are inherently bad at extrapolating exponential trends, and we are currently experiencing technological exponential growth. This is especially true in the information and communication technologies, namely information processing, sensing and pattern-recognition, and human-to-human communications.

  • Reading between the lines of his talk, information technologies are bootstrapping technologies: once you have them, they make inventing the next stage easier, faster and cheaper.

  • The combination of biotech, new biological sensors and the ability to simulate complex processes are going to seriously challenge how we currently think of ourselves as individuals and even what it means to be human.

Where I think he’s got it wrong:

  • As I mentioned a few days ago, I think some of his exponential curves are the result of our natural tendency to gloss over things that happened in the past and focus on recent developments. (A less generous assessment would say he just did it to make his curve work out, but this isn’t limited to Ray’s charts; in fact, he showed the same graph with points plotted from other lists of momentous inventions drawn from various encyclopedia.) This is not to say there aren’t several exponential growth curves in play at the moment, but I don’t think this is a trend that has been going on for hundreds of thousands of years.

  • It’s an old saw that people overestimate what will be possible in five years and underestimate what will be possible in 20 years. I think his predictions of ubiquitous augmented reality, computers distributed throughout one’s clothing, and head-up display contact lenses (or direct-to retina/optic nerve) will all happen at some point, but not in the next 5 years.

  • Ray talks about the creation of artificial intelligences as if some day in the near future we’ll invent HAL and start talking to it. Ever since Alan Turing described the Turing Test, people have described artificial intelligences in terms of ability to generate and understand language, ability to make human-like decisions, ability to show and understand emotion — in other words, the ability to relate to humans. I see no reason to think the first AIs will think or communicate like us at all, nor do I think they will exist at human scale.

    In fact, I would say several species of human-made hyper-intelligences already walk among us: we call them corporations, nation-states, philosophical or political movements, and civilizations. Their neurons are the people, documents and cognitive artifacts that make up the whole. Their synapses are the communication and social networks that run between these individuals. The specific structure of the intelligence is set by its laws, traditions and culture.

    The dual of the idea that groups of people, documents and cognitive artifacts can be a single intelligence is the idea that my own human intelligence, as an individual, is actually made up of more than just what I can think when I’m lying naked and alone. As Edwin Hutchins points out in Cognition in the Wild, human intelligence is not just the product of what’s inside our skull but stems from the combination of our brains, our culture, and tools such as the paper we write on and the skill of writing itself. I expect by the time a machine with no human in the loop has passed the Turing Test, the continuing augmentation of humans will have long-since forced us to recognize that the test wasn’t all that good a criterion for intelligence in the first place.

  • Even though our knowledge and our information technologies are improving exponentially in many fields, there are some parts of human knowledge that are not growing at this incredible rate. Notably, our understanding of existential questions about the purpose of life, what we as humans value, and the meaning of free will and have not kept apace with technology — even though in many cases new technology and new understandings about the world have pulled the rug out of our previous answers. These questions will become especially important as we start fundamentally modifying our biology and finally unravel the mysteries of the mind itself.

The Singularity is near now?

Kevin Drum over at the Washington Monthly has a nice extrapolation based on Ray Kurzweil’s new book (see his chart for added effect):

With that said, however, it turns out that I do have a bone to pick with Kurzweil over one of the trend charts that litter his book. Basically, he argues that the pace of change has been accelerating over time, so that major inventions are being created ever faster as time goes by. 10,000 years ago it took several thousand years between major inventions (agriculture –> wheel), while a century ago it took only a few decades (telephone –> radio).

Fine. But his cleverly constructed chart cheats: it stops about 30 years ago. So I decided to extend it. My version of his chart extends to last month (see pink shaded area), and it indicates that major, paradigm-busting inventions should be spaced about a week apart these days.

So what gives? Seems to me that the Singularity should be right on our doorstep, not 40 years away. And while 40 years may not seem like all that much in the great scheme of things, it means a lot if you’re 46 years old. Which I am.

So what happened?

If I had to guess without having read the book yet, I’d say what the chart really shows is the gloss of history: the longer ago something was, the less important we take it to be and the more we lump it together with everything else from that period. For example, the last four entries on Ray’s chart are the Industrial Revolution, the Telephone, electricity, and radio (as one event), the computer and the personal computer (as two events). Why did he decide to label these as four paradigm-busting inventions rather than seven, or as one? Contrarily, why are writing and the wheel lumped into the same invention, or printing and the experimental method? Depending on what you call a single “event” the spacing between those events could show accelerating change, constant change, or stability punctuated by short periods of rapid change (the last one being my own personal belief).

Could the one true constant be the belief that our generation is experiencing more change than any other?

Rain falls, film at 11.

I love living in a place where a little rain makes front-page news:

Moisture and unstable air spinning off a tropical storm along the coast of Mexico brought a rare burst of thunder, lightning and rain — even some hail and power outages — to the Bay Area on Tuesday afternoon.

From 1 p.m. to about 3 p.m., thunder boomed as the brunt of the storm passed over San Jose, Fremont, Palo Alto and San Francisco, while sporadic rainfall wet roadways and cooled down the region.

It was the first rain since June 17, when 0.03 inches fell in San Jose. And the storm marked the first recorded rainfall on Sept. 20 in San Jose at least back to 1948, according to National Weather Service records.

Wikipedia the (physical) World

Semapedia is a project to annotate physical locations with 2D barcodes that link to Wikipedia articles. With the Semacode software running on your PDA/cellphone, you scan a barcode and it’ll take you to the linked-to article. There’ve been a lot of attempts at this sort of physical annotation of the world, WorldBoard being one of the earlier ones I remember.

semapedia-bug.gif

I like the concept in theory, but I’m always disappointed by the quality and variability of the links. Do I really want a link about privacy just because I see a no-tresspassing sign, or about the Hofburg Imperial Palace just because I’m standing there? Perhaps, if I’m in the mood for ironic social commentary or I’m a tourist with an interest in architecture, but most people won’t be the right audience for any given link. One man’s art is another man’s graffiti, and the world-annotation systems I’ve seen are currently little more than virtual spray paint.

The variability is the real key. If 90% of the tags I come across link to something interesting to me, I’ll probably follow every one I see. If only 50% link to something interesting, I might look at the human-readable title printed on the tag and then decide whether I think it likely that the article will be well-written and interest me. If 90% of the tags wind up being useless, I won’t even bother reading the title — and then it won’t matter that there are 10% that I would have enjoyed if I had bothered to look.

I’m not totally pessimistic about this sort of technology though. With the right combination of filtering (to make tags I don’t care about completely invisible), subtlety (to make the tags I might care about still be unobtrusive in case I don’t want to be bothered) and community support (to insure relevance to me and to bond me to my community regardless of the link quality), I could see something like this finally taking off.

(Thanks to Eugen Leitl on the Wearables mailing list for the link!)