Comments

(newest first)

  • Vivek | Sun, 09 Jul 2017 07:43:30 UTC

    Nice blog! I have taken a shot at explaining search engines - a stab at creating them from scratch using your own data - https://machinelearningblogs.com/2016/12/12/how-to-build-a-search-engine-part-1/
    Let me know what you think!
  • name | Wed, 25 Feb 2015 05:08:39 UTC

    I agree that writing and deploying a search engine for use by public users can be quite a work. The difficulty is in the infrastructure requirement such as storage and extensibility to handle requests. This can be done in under a day for a folder on your computer. Although this is not viable search engine from a deployment to public, never the less it is a search engine that will be efficient. 
    Thanks
    http://www.fixithere.net
  • MonjuSarkar | Sun, 22 Feb 2015 08:38:04 UTC

    I want to build up a new search engine.But I do not know how can I start this search engine.No one give me a good idea.At present most of the people does not like google,so they are now looking for a new one.Because google is too much commercial.Please if anybody read this message please give me a good information about search engine.
  • nishant | Mon, 16 Feb 2015 17:39:34 UTC

    Sir, I am a student doing B.Tech Computer Science Engineering. I want to develop a project in Search Engine Optimization or making a search engine. Can I get some ideas for my minor project?
  • Irfan Ullah | Thu, 22 Nov 2012 05:03:38 UTC

    Great Article, very informative
    
  • Wouter-Jan | Fri, 02 Nov 2012 10:50:34 UTC

    Why would you not suggest peer-to-peer crawling as this could eliminate bandwidth issue.
  • Ashwani¬†¬†Priyedarshi | Wed, 25 Jul 2012 10:39:00 UTC

    Just Loved it :) With data on internet increasing at exponential rate its becoming difficult it get it right day by day.
  • Lukas Gutschmidt | Tue, 20 Mar 2012 20:21:21 UTC

    Loved your article. I spent like half a year on a search engine (Though I do not work full time, since I'm still and I am alone.). Probably the most time fixing overcoming errors. Happens that, when you have crawled a few million documents, there is one website with an error and you have to restart the system and find a way to fix it.
  • Harisankar Krishna Swamy | Sun, 30 Oct 2011 13:54:35 UTC

    I agree that writing and deploying a search engine for use by public users can be quite a work. The difficulty is in the infrastructure requirement such as storage and extensibility to handle requests. We can think of replicable group of nodes to address extending for more requests. But, the logic behind a search engine and the data structures are not that difficult to write on your own. In fact to write a simple search engine like Google which gives you the list of html web pages files which  have your key words is not that difficult at all. This can be done in under a day for a folder on your computer. Although this is not viable search engine from a deployment to public, never the less it is a search engine that will be efficient. 
    
    The design and implementation of the java search engine, its performance are here
    http://harisankar-krishnaswamy.blogspot.com/2011/10/how-to-code-your-own-search-engine-like.html
    
    Best Regards,
    Harisankar Krishna Swamy
  • David Sifry | Sat, 25 Dec 2010 05:38:03 UTC

    Just remember that this article was written in 2004. There's been a lot of changes due to moore's law and the shift towards cloud computing (i.e. no need to build your own hadware or colos for many types of systems anymore) which changes some of the specifics of Anna's advice. However, the core of it is very ver sound. 
    
    I'd also look at Lucene, which has matured into a pretty kick-ass search engine that can handle real-time indexing as well as different dynamic relevance functions. It's also something that runs at scale at a number of large internet properties, and if you do insist on building your own, at least you can refer to the Lucene source code to see what to do differently.
  • Andrew J | Sat, 25 Dec 2010 02:57:20 UTC

    Why exactly do you recommend IDE over, say, SATA...?
  • Lou | Sat, 11 Dec 2010 23:49:59 UTC

    I feel that Cloud Computing eliminates half of the obstacles this article presents.
  • MOHIT DAGA, LNMIIT | Fri, 03 Dec 2010 18:39:18 UTC

    A search engine can easily be designed by a single persor, in that the aproach to wite the same must differ from the one stated by the author. the new aproach could be to mine the data on the spot. that is whenever a searcher search the code on his machine mines the data he had searched on the same instant and returns result. This is very easy. say u have N( 1 million ) main sites each site has say M(1000) pages on an average to search the title you need to run your programm for only (10^9) times.  Say at general the speed is 512 kbps then the same is done in less than 4 seconds. and the search is efficient enough
    
    main thing u don't need servers....... 
  • Korab | Fri, 19 Nov 2010 22:32:01 UTC

    Hi Anna, everyone...
    
    I was wondering if any of you could help point me in the right direction...
    
    I'm thinking of building an all Albanian search engine. My language is not widespread, I estimate a potential market of about 5 million users, most of whom do NOT speak any other language. i won't get into the the marketing side of this venture because i'm pretty sure i can pull it off if i have a good engine and solid promotions.
    
    My problem is estimating the time, energy and resources for building the thing! I don't have programming background so i was wondering:
    
    1. Can you point me towards a success story of a local search engine?
    2. How many people with what types of skills do i need to work with?
    3. Let's say i do manage to build an Albanian (it's a language for those of you who don't know) search engine, the thing works, i get users, i get indexed pages, i have advertisers and a market for their clients, i have already found some capital to grow and the business has expanded raking in a decent profit... What sort of business/industry would be interested in the acquisition of this business that i've successfully built :)? (it's a long shot i know - i'm just thinking out loud here)
    
    Any insight would be highly appreciated. 
    
    thanks in advance,
    KK    
    
  • John | Thu, 18 Nov 2010 21:47:55 UTC

    I'm looking for a team to build a search engine with me. I have a great, unique concept. Marketing will come easy. Social networks and connections that I have are in place. I also have  access and to all the equipment we will ever need.
    
    Contact me at jwoowolterman@msn.com
  • mani | Thu, 17 Jun 2010 09:21:45 UTC

    I liked the information. i have a unique idea for a search engine. i need a make a team to work together. trust me or not none of the current search engines (like yahoo, Google etc.) have these facility. If i do it, it gonna b big BOOM BOOM in upcoming years. if you want to be a part of my project email me at dhorg@att.net 
    !!! I BELIEVE IN DOING !!!
  • Scott | Thu, 25 Jun 2009 06:17:08 UTC

    Great article - helped me to figure out how to write a search engine :).  Of course, my search engine only needed to index our own site, which was ultra-small (100- pages) and hosted locally, so it was easier.
    
  • kar | Tue, 23 Dec 2008 13:48:10 UTC

    on 2nd tought, please ignore my 1st comment, i post that just after read few paragraph. Great article.
  • kar | Tue, 23 Dec 2008 07:11:56 UTC

    Nice article, but it wasnt that hard to build a full fledge WEB search engine tbh. im currently working on one. Disk Seek, CPU can easily overcome with clustering. My dev setup currently have 3 servers. its easy if you set a limit on each server to let say keep an index of only 100m docs. 10 cheap home-made server = 1b docs. filtering+stopword+stemming -20% of 10kb per doc.
    
    The real hurdle here is not building a search engines, but to carefully market them.
Leave this field empty

Post a Comment:

(Required)
(Required)
(Required - 4,000 character limit - HTML syntax is not allowed and will be removed)