Community Server for Rent a Coder

How Software Gets Done
Welcome to Community Server for Rent a Coder Sign in | Join | Help
in Search

Current technical issues on RentACoder.com

  • SQL Server 2005 Database issues 95% resolved

    As you've probably already noticed, the database issues are almost completely resolved. 
    For the coders out there who run SQL Server 2005, this article was created documenting the problems encountered and their solutions so you do not have to go through the same problems we had to:
    http://rentacoder.com/CS/blogs/real_life_it/archive/2006/04/28/477.aspx

    The remaining minor issues are already documented elsewhere in this blog as individual posts and their progress will be updated there.

  • RESOLVED: Denial of service attack

    4/28/2006 10:00 AM: Peak 10 (our data center colocation provider and ISP) is currently experiencing a network issue which is causing the site to be intermittently unavilable to you at times. Early indications lead them to believe this is a denial of service attack. At this point there is no reason to suspect it is targeted at us (versus the other thousands of companies in their data center). They are working to resolve it and additional information will be passed on to you once we know more.

    11:39 AM: Peak 10 appears to have stablized and recovered from this attack.  For the last 15 minutes the extreme packet loss has stopped and access to the site is continuous again.  DOS attacks are notoriously hard to detect and stop, so if it is indeed fixed, I give them credit for doing it so quickly.

    As an aside, this is the first outage we have had with Peak 10 since we colocated with them since last year.  The physical connectivity, facilities and the customer service have been outstanding.

    11:48 AM: Received notice from Peak 10 that the outage has ended.

    Dear Valued Customer,
    Please be advised that Peak 10 engineers have resolved the following
    impairment:
    Date:                   4/28/2006
    Ticket#:                75506
    Start of Impairment:    09:35 EDT
    End of Impairment: 11:35 EDT
    Data Center(s):         All
    Summary:   Network
    Peak 10 network engineers were able to isolate the network targeted by the denial of service attack and connectivity has been restored.  As part of Peak 10's ongoing effort to deliver the highest quality service available, we will continue to investigate the circumstance associated with this event and will publish a detail of this impairment within the next two (2) business days.
    It is Peak 10's mission to provide the highest level of service to our customers. 
    We appreciate your business and look forward to continuing our relationship in the future.
    Sincerely,
    The Peak 10 Solution Support Team
    If you need assistance, the Peak 10 Solution Support Center can be reached

    1:57PM: Received notice that it was another customer at Peak 10 being targeted by the DOS attack, and not Peak 10 themselves.  They diverted the traffic and stopped the attack after figuring out who it was.

  • COMPLETED: Upgraded support contract with Microsoft signed by Rent a Coder

    Tuesday: April 25th, 2006: In order to get better/faster resolutions to the site issues  that we are currently experiencing (detailed elsewhere on this blog), we have upgraded our support contract with Microsoft to a very high level of support .  It was not cheap, but for about $10,000 we now will have access to top tier support (rather than entry and mid level).  Such support is often the difference between finding a difficult problem quickly, rather than struggling for days or weeks with someone who may be learning on the job.  An incident manager will also help ensure that issues are resolved by Microsoft as quickly as is possible.  We are hopeful that this substantial investment will pay substantial divdends by helping to restore the site to normal operating condition in the quickest manner possible.
  • COMPLETED: Database upgraded to SP1

    COMPLETED: Microsoft has fixed a # of bugs in SQL Server in an updated called Service Pack 1 (SP1).  This was installed on Monday night (April 25th) from 8pm-1am Tuesday night.

    This upgrade contains about 20-30 fixes including one that helps the CPU (which is what is causing problems on the server right now).

  • RESOLVED: Uncertaintly of whether all CPUs of new SQL Server are being used

    Update 4/30/2006:  Pre R2 versions of Windows 2003 x64 Standard Edition support quad processor machines....but NOT symmetric processing of the four processors.  This means it fills the first one up all the way and then goes to the 2nd and so on, which is very inefficient.  The server was upgraded to R2 and now evenly distributes the load across all 4 processors.

    Description: The SQL Server was upgraded from 2 processors to the most top of the line 4 processor machine that Dell sells.  However performance has not improved proportionately. 

     There is some question as to whether 2 of the 4 CPUs of the new SQL Server are being used properly.   As a result, we may only be getting half the performance we could be.

    Technical details:  On some areas of Microsoft's site, it says that the version of Windows 2003 on the server (Windows Server x64 Standard Edition SP1) supports 4 processors...but on others it says that it does not an must be upgraded to Windows Server x64 Standard Edition R2.  The server itself is equally contradictory...it seems to detect all 4 processors in task manager, but on the "computer properties" only identifies 2 of them.  A Microsoft rep recommended that we upgrade to R2 to resolve the ambiguity.

    So the server is going to be upgraded to R2 to see if this will utilize the 3rd and 4th processor better or not.  An order to Dell has been placed for new R2 licenese and this upgrade will be done the night it is received (hopefully Tuesday night) from 8pm-1am.

    4/25/2006: Update: Microsoft is double checking to make sure R2 will truly fix this, and will hopefully be getting back to us tomorrow (Wednesday).

  • Timeouts and pages not displaying completely between 10:30pm and 1:30am EDT

    During 10:30pm EDT till roughly 1:30am EDT, certain pages on the site (that access the database) timeout or do not display completely.  The problem is caused by a workaround/fix recommended to us by Microsoft to solve a memory problem with SQL Server 2005 during the backup window. The workaround was to turn off the "verify" feature of the backup, which stops the memory from being completely sucked up.  However, this means we now have to do the verification on a 2nd server by copying it to another server.  During this process, the copy also sucks up all the memory. It is a catch 22 situation at the moment.  We are working with Microsoft to get a resolution for this secondary problem as quickly as possible and thank you for your patience.

  • RESOLVED: Web site compression not working

    4/5/2006: This issue has been resolved.   Microsoft support gave us a way to disable the new ISA SP2 feature and IIS compression is working again.

    3/16/2006: Microsoft had us install Service Pack 2 to our ISA firewall a few weeks ago to try to correct one of the other problems mentioned on this blog concerning the firewall.  It turns out that SP2 has a side effect in that web site compression no longer works like it used to.  (Technical details: Specifically ISA no longer honors the SendAcceptEncodingHeader that used to allow it to pass through compression done via the internet server.  ISA SP2 has it's own compression, but it's buggy...turning it on crashed the server under a heavy load.)As a result the site takes longer to load in your browser than it used to.  Microsoft will try to get us an answer tomorrow about how to solve this problem.
  • RESOLVED: Timeout errors and site unavilable

    The site database was upgraded in February to a quad processor 64 bit server running the lastest version of the database software (Microsoft SQL Server 2005) to improve it's performance.   Unfortunately the upgrade to the new Microsoft SQL Server 2005 has caused the site to timeout it two ways:

    1) 5/5/2006:RESOLVED with SQL SERver 2005 SP1

    The new database is not correctly optimized due to a bug in SQL Server 2005's optimization wizard that causes it to crash on certain "database views".  This causes general sporadic timeouts and slowness in the database.  Microsoft has confirmed this as a bug and we are working with them to get a patch issued to us as quickly as possible. (3/16: Microsoft has created a fix, but testing it will take 3 weeks to make sure it doesnt break something else.  We are anxiously awaiting this fix and will update you when it comes in).

    2) 5/5/2006RESOLVED: Moving the backup to a seperate drive fixed this problem. 

    During the database backup between roughly 8:15 PM till a little after 9PM.  This should not be happening as the database should allow a backup to occur with stopping access.  A warning message has been added to the site during this time period and a 2nd ticket has been opened with Microsoft to resolve this bug.  (3/16: We have updated the server with 15k SCSI drives (the fastest type for servers) to lessen the backup window.  Microsoft has promised to see if they have any information on how a 24/7 site is supposed to do backups and best practices/etc..   We will update you when we know more.  4/5: No new info from Microsoft on ways to improve this yet...still pushing this issue with them. Also discovered memory issues with backups which has been referred to them.).

  • Site not available

    The site's firewall--Microsoft Internet Security and Acceleration Server (ISA), currently has three problems that we have opened up tickets with Microsoft to fix.

    1) Status: 5/5/2006: RESOLVED by ISA SP2. 

    The first is that a filter crashes randomly (about once every 2 weeks) and causes the entire site to go down.  (Update 3/16: Microsoft believes that the latest SP will fix this.  It is being tested right now.  Update 4/5: Still no reoccurence of this issue. If it doesn't reoccur this month then it will be considered resolved).

    2) Status: 5/5/2006: RESOLVED by Microsoft workaround.

    The second is that ISA reports that it cannot log certain information and shuts itself down...again causing the entire site to go down.  This happens more frequently...1-2 times per week.  (Update 3/16: Microsoft has given us a workaround of turning of MSDE, which appears to have problems under load.  We are testing now to prevent this problem.  Update 4/5: Still no reoccurence of this issue. If it doesn't reoccur this month then it will be considered resolved).

    3) Status: MICROSOFT STILL WORKING ON THE PROBLEM: 4/5/2006: ISA stops serving web pages...alternating between "DNS server not found" and "out of memory".   Sometimes http works while https doesn't and vice versa.  Microsoft is working on the problem.

  • WORKAROUND IMPLENTED: Microsoft Internet Information Services (IIS) "object not found" error

    Update 4/30/2006:  This problem is due to a flaw in Microsoft's ASP (having to do with include files) that fragments the memory and there is no way to correct it without running into other problems which are more serious.  Short term we have doubled the memory on each server and bought a 3rd internet server for the Rent a Coder pool.  These servers are rebooted daily to keep the memory as defragmented as possible.  This problem now appears only once every two-three weeks instead of many times a day.  The long term solution is that we will be moving eventually to ASP.NET which does not have this problem.

    --------------------------------------------

    Description: The internet server will sometimes report the below error along with running out of virtual memory.  The parts in grey will be different from time to time, but the black portions will always be the same:

    Microsoft VBScript runtime error '800a01b6'
    Object doesn't support this property or method: 'xxxxxxxxxxxx'
    /CrossWeb/include/BannedIpAddress.asp, line 369

    Frequency: The problem occurs randomly.  Rebooting the server "fixes" the problem but is only a temporary solution. 

    Status: There is currently a ticket open with Microsoft to resolve this issue.  It is caused by a flaw in Active Server Pages (ASP) that only triggers in very high volume situations.  Currently the server is being rebooted every day to attemp to minimize the times it appears.

  • FIXED: Slow newsletter

    Due to increased demands for the RAC daily newsletter, the time to send it out had increased from 7 hours (starting at 6AM EST) to 16-17.  To handle the increased volume, the database was upgraded from SQL Server 2000 to SQL Server 2005 in late February.  This has resulted in the time being reduced by 1/3rd.  New hardware and software will be installed in mid March (a brand new server with double the processor speed and a newer operating system) to further speed up the newsletter.

    3/16: New hardware and software has been installed.  The newsletter now goes out before 11:30am to all HTML subscribers (text subscribers receive emails after that).  As the site grows, the times will slow down again.  We will monitor the situation and apply new hardware/software solutions as required.

This Blog

Post Calendar

<January 2009>
SuMoTuWeThFrSa
28293031123
45678910
11121314151617
18192021222324
25262728293031
1234567

Syndication

Powered by Community Server, by Telligent Systems