Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] presentation wish list
- Date: Tue, 14 Oct 2014 15:01:44 +0200
- From: Benjamin Kowarsch <trijezdci@example.com>
- Subject: Re: [tlug] presentation wish list
- References: <CADR0rnfUqZyOa4v-WeVCX3N+vgih8q0hXUe307W8Oxz3thVjqQ@mail.gmail.com> <543BB1BA.7080006@extellisys.com> <CADR0rndoYB6UoG2r=T_6=+7j2oyk0Nx+DrSjf9i1n6KctoLE0A@mail.gmail.com> <543D146C.7040404@extellisys.com>
On 14 October 2014 14:17, Travis Cardwell <travis.cardwell@example.com> wrote: > > Tokyo Parsing Study Group? ;) Well, if there is sufficient interest, then why not?! As a cautionary note though, interest groups with a very specialised and narrow topic tend to be shortlived even in a place like Tokyo. We once had an informal group of LLVM compiler hackers, some four or five people, and met for beers and chat once a month but it didn't last more than a few months. > Whether there is enough interest to warrant a presentation or not, I look > forward to discussing the topic with you, perhaps at a nijikai. Sure, that sounds like a fun topic. > I have a > project (that is unfortunately currently on hold) that requires a bit of > parsing, and I would love to get your thoughts on my strategy. Perhaps as a general post-scriptum on this. Building a proper parser is really a widely applicable and useful skill to know about. Contrary to common perception, it is not limited to programming language design. In fact, Terence Parr, the author of ANTLR told me that he has changed the usage model and design for ANTLR in version 4 because demand for the tool was coming from all kinds of areas but least from language design. Whenever you build a piece of software, there is various sources of input data that should always be verified because bad input is a very common cause for security vulnerabilities. Any kind of input, whether interactive input from a user terminal/browser session or from a data file should ALWAYS be verified for 100% correctness in order to close this exploit route. Unfortunately, verifying input data is very often neglected. People often use ad hoc verification that does not catch all possible malformed input. I vividly remember the sorry excuse for a parser to read configuration files in Asterisk. It searched from left to right to find the first opening square bracket and then at the end of a line from right to left to find the first closing square bracket, then accepted any input between them without further verification. Something like ... [[foobar]] would be accepted as [foobar] and I had demonstrated how this could have been used in hosted multi-tenant PBXes to hijack another tenant's account and make phone calls on their bill. Yet nobody cared because the task of writing a proper parser for such a simple configuration file was considered overkill. I have since seen similar situations both in open source as well as in commercial projects. There are three reasons why people tend to neglect building proper parsers to verify all their input: (1) when trying to use an automated parser generation tool, they find out that the parser generator has a very steep learning curve and only does half of the work, while the other half of the work still has to be coded manually. (2) perhaps never having written a recursive descent parser from scratch, many people believe this must be an immense effort that requires a rocket scientist of sorts, when it is in fact rather simple, even more so for simple input data formats. (3) many software folks tend to be like children in a candy store, they don't like to spend a lot of time on planning and design, but want to start hacking code right away. Building a parser, whether manually or using a generator tool, always requires a fair amount of preparation and design. You need to write a grammar, verify the grammar, then build your parser strictly following the grammar. This runs counter to many people's habits. The reason why I proposed a presentation that starts off with a bit of theory on recursive descent and then shows how to code a parser by and and with a tool that automates the tedious verification and conflict catching is precisely to counter the prevailing perception and habits. (1) parsing is not rocket science, it can be done fairly simple. (2) parsing is universally useful, it can significantly contribute to keeping software more secure and reliable. (3) the effort to plan and design, tedious as it may seem, really pays off when implementing. (4) the most tedious activities such as grammar verification can be automated using tools such as ANTLR. In other words, the idea is to take the pain out of parsing, both perceived and actual pain. A presentation of flex/bison may however have the opposite effect.
- References:
- Re: [tlug] presentation wish list
- From: Benjamin Kowarsch
- Re: [tlug] presentation wish list
- From: Travis Cardwell
- Re: [tlug] presentation wish list
- From: Benjamin Kowarsch
- Re: [tlug] presentation wish list
- From: Travis Cardwell
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] presentation wish list
- Next by Date: Re: [tlug] presentation wish list
- Previous by thread: Re: [tlug] presentation wish list
- Next by thread: [tlug] [meta] focus, projects, series
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links